Supervised Learning

What Is Supervised Learning?

Supervised learning is the most widely used type of machine learning. The algorithm is given a set of training examples — each one is a pair of an input x and the correct output y — and it learns a function f such that f(x) ≈ y.

Once trained, the model can produce a reasonable output for inputs it has never seen before. The key word is supervised: a teacher (the labeled training data) tells the algorithm the right answer during training. After training, no teacher is needed.

Supervised Learning = f(x) → y — x is the input (features), y is the correct label provided only during training, and f is the function the algorithm learns.

The Same Pattern, Every Time

Every supervised learning problem follows the same structure, regardless of domain. You provide labeled examples — the algorithm generalises from them.

Diagram

Three supervised learning problems: same pattern in every case — labeled input-output pairs train the model, then it predicts on new inputs.

Notice what all three examples have in common:

There is always an input x (email text, farm conditions, patient data).
There is always a correct label y that was known at training time (a human or process provided the answer).
Once trained, the model predicts y for new x values it was never shown.

Quick Check

A model is trained on thousands of house listings, each with features (size, location, age) and a known sale price. Which paradigm is this?

Two Types of Supervised Learning

Supervised learning splits into two types based on what kind of output y you are predicting.

Type	Output y	Example
Regression	A continuous number — any value on a scale	Crop yield in tonnes, house price in dollars
Classification	One label from a fixed set of categories	Spam / Not Spam, Disease / Healthy

The distinction matters because regression and classification use different algorithms, different loss functions, and are evaluated with different metrics. The next two modules cover each in full.

Quick Check

A model predicts tomorrow's temperature in °C. Is this regression or classification?

Training, Validation, and Test Sets

A supervised learning workflow splits data into three parts, each serving a different role.

Training set — used to fit the model's parameters. The algorithm sees both x and y for every example here.
Validation set — used to tune hyperparameters and catch overfitting during development. The model never trains on this data.
Test set — held out until the very end. Used once to estimate real-world performance. Touching it repeatedly during development gives an overoptimistic picture.

Hyperparameters are configuration values set before training — for e.g., learning rate, number of layers, regularization strength. They are not learned from data; they are chosen by the developer, often using the validation set as feedback. We will cover them in detail in later modules.

Quick Check

Why should the test set be used only once, at the very end of development?

What Is Supervised Learning?

The Same Pattern, Every Time

Two Types of Supervised Learning

Training, Validation, and Test Sets

Test Your Knowledge