Model Selection and Training

Just because a learning algorithm fits a training set well, that does not mean it is a good hypothesis.
It could over fit and as a result your predictions on the test set would be poor.
The error of your hypothesis as measured on the data set with which you trained the parameters will be lower than the error on any other data set.
Given many models with different polynomial degrees, we can use a systematic approach to identify the 'best' function.
In order to choose the model of your hypothesis, you can test each degree of polynomial and look at the error result.

The problem when choosing a model based just on it's test set error

If we don’t use cross validation sets when choosing the correct model, the d5 model below is likely to be an overly optimistic (overfitted) model for the test set.

One way to break down our dataset into the three sets is:

Training set: 60%
Cross validation set: 20%
Test set: 20%

How To select the best model

We can now calculate three separate error values for the three different sets using the following method:

Optimize the parameters in Θ using the training set for each polynomial degree.