Train Error and Test Error
You already know that supervised learning splits data into a training set and a test set. The training set is what the model learns from. The test set is held back and used to measure real-world performance.
These two sets give you two error numbers:
- Train error — the model's error on the examples it was trained on. Measures how well the model fits the data it has already seen.
- Test error — the model's error on the held-out examples. Measures how well it generalises to new, unseen data.
The gap between the two is the most useful signal in model evaluation. A small gap means the model learned something real. A large gap means it memorised the training data without learning the underlying pattern.
What Is the Bias-Variance Tradeoff?
Every supervised model makes errors. Those errors come from two distinct sources: bias and variance. Understanding which one is hurting your model tells you exactly what to fix.
- Bias — error from wrong assumptions. A high-bias model is too simple; it cannot capture the true pattern in the data and performs poorly even on training data.
- Variance — error from sensitivity to noise. A high-variance model is too complex; it memorises the training data (including its noise) and fails on new examples.
The tradeoff is that reducing one tends to increase the other. Increasing model complexity lowers bias but raises variance. The goal is the sweet spot where total error is minimised.
High Bias (underfitting): model too simple → high train error AND high test error High Variance (overfitting): model too complex → low train error BUT high test error
Diagnosing Bias vs Variance
The fastest diagnostic is to compare training error and test error side by side.
| Symptom | Train Error | Test Error | Diagnosis |
|---|---|---|---|
| Both errors high | High | High | High bias — underfitting |
| Big gap between them | Low | High | High variance — overfitting |
| Both errors low | Low | Low | Well-fitted model |
| Both errors medium | Medium | Medium | May need more data or a better model |
For e.g., a model that scores 99% on training data but only 61% on the test set has low bias but high variance — it memorised the training set instead of learning generalisable patterns.
A model scores 99% on training data but only 61% on the test set. What is the most likely problem?
What Causes Each?
High Bias (Underfitting)
A model underfits when it is not expressive enough to represent the true relationship in the data. Common causes include choosing a model that is too simple for the problem, insufficient training time, or over-aggressive regularisation that constrains the model too tightly.
- For e.g., fitting a straight line (linear regression) to data that follows a curved pattern will always underfit — the model cannot bend to match the data no matter how much it trains.
- Fixes: use a more complex model, add more features, reduce regularisation strength.
High Variance (Overfitting)
A model overfits when it learns the training data too well — including random noise that does not reflect the real-world pattern. This happens with very complex models trained on small datasets.
- For e.g., a deep decision tree with no depth limit will perfectly classify every training example but fail badly on new data because it has essentially memorised the training set.
- Fixes: regularisation (L1/L2, dropout), more training data, early stopping, cross-validation to detect it early.
A linear model achieves 58% accuracy on training data and 57% on the test set. What does this suggest?
The Total Error Equation
Total error can be decomposed into three parts:
Total Error = Bias² + Variance + Irreducible Noise
- Bias² — how far the average prediction is from the truth.
- Variance — how much predictions scatter around their average.
- Irreducible noise — randomness inherent in the data that no model can remove.
The irreducible noise sets a floor on how well any model can do. Optimising means reducing bias² + variance — and because they pull in opposite directions, there is always a tradeoff.
Which component of total error cannot be reduced no matter how good the model is?
