Training an algorithm on a very few number of data points (such as 1, 2 or 3) will easily have 0 errors because we can always find a quadratic curve that touches exactly those number of points. Hence:
- As the training set gets larger, the error for a quadratic function increases.
- The error value will plateau out after a certain training set size.
Experiencing High Bias (Underfitting)
- Small training set causes:
- Cost of training set Jtrain (Θ) to be low
- Cost of cross validation set JCV (Θ) to be high
- Large training set causes:
- Cost of training set Jtrain (Θ) to be high
- Cost of cross validation set JCV (Θ) to be high
- Jtrain (Θ) ≈ JCV (Θ)
High Bias (Underfitting) Summary:
If a learning algorithm is suffering from high bias,
getting more training data will NOT (by itself) help much.

Experiencing High Variance (Overfitting)
- Small training set causes:
- Cost of training set Jtrain (Θ) to be low
- Cost of cross validation set JCV (Θ) to be high
- Large training set causes:
- Cost of training set Jtrain (Θ) increases
- Cost of cross validation set JCV (Θ) continues to decrease without leveling off ****
- Jtrain (Θ) < JCV (Θ), but the difference remains significant
High Variance (Overfitting) Key Summary:
If a learning algorithm is suffering from high variance,
getting more training data is likely to help.