Learning Curves

Training an algorithm on a very few number of data points (such as 1, 2 or 3) will easily have 0 errors because we can always find a quadratic curve that touches exactly those number of points. Hence:

As the training set gets larger, the error for a quadratic function increases.
The error value will plateau out after a certain training set size.

Experiencing High Bias (Underfitting)

Small training set causes:
- Cost of training set Jtrain (Θ) to be low
- Cost of cross validation set JCV (Θ) to be high
Large training set causes:
- Cost of training set Jtrain (Θ) to be high
- Cost of cross validation set JCV (Θ) to be high
- Jtrain (Θ) ≈ JCV (Θ)

High Bias (Underfitting) Summary:

If a learning algorithm is suffering from high bias,

getting more training data will NOT (by itself) help much.

Experiencing High Variance (Overfitting)

Small training set causes:
- Cost of training set Jtrain (Θ) to be low
- Cost of cross validation set JCV (Θ) to be high
Large training set causes:
- Cost of training set Jtrain (Θ) increases
- Cost of cross validation set JCV (Θ) continues to decrease without leveling off ****
- Jtrain (Θ) < JCV (Θ), but the difference remains significant

High Variance (Overfitting) Key Summary:

If a learning algorithm is suffering from high variance,

getting more training data is likely to help.