Bias and Variance

Bias:

To know that the algorithm is good, we can evaluate it's bias and variance.

  1. How well the model does on training data, will tell us the bias - if it underfits, then the loss will be high indicating more bias.
  2. How well the model does on cross-validation data, will tell us the variance - if it overfits, then the loss will be much higher on validation dataset even though it is low on training data, indicating more variance.high
  3. If the loss is not high on training and is relatively low on validation dataset, then the model does not have high bias or high variance. Indicating that the model is a good fit for our data.

Usually training loss and cross-validation loss will be pretty close to each other. Fit a high order polynomial to this small dataset = high variance, Loss_cv >> Loss_train

  1. Sometimes we end up with high bias and high variance at the same time. This does not happen with linear models with single variable. The intuition is that, the model fits the part of the training data really well, but part of the data it underfits. This causes high bias and the Loss_cv is higher than the Loss_train causing high variance.

Loss_cv
Loss_train
Degree of polynomial