Regression

What Is Regression?

Regression is a type of supervised learning where the output y is a continuous numerical value — a real number rather than a category. The algorithm learns a function f(x) that maps inputs to a number anywhere on a scale.

The defining question for regression is: are we predicting a number? If yes, it's regression.

Regression predicts a number. The output y can be any value — 2.3 tonnes, 340,000 dollars, 21.7°C. There are no fixed categories, just a continuous range.

The Crop Yield Problem

Suppose a farmer wants to predict how many tonnes of wheat a field will produce before the harvest. Several factors are available as inputs.

Diagram

Rain falls continuously as the sun arcs across the sky — day and night — feeding into the crop yield model.

The inputs — also called features — are the variables the algorithm uses to make its prediction:

Feature (x)	What it measures
Soil quality	Nutrient content, pH level
Rainfall	Total mm of rain in the growing season
Temperature	Average °C during crop growth
Fertilizer	Amount applied in kg/hectare

The output y is the crop yield in tonnes per hectare — a number. It could be 3.2, 5.7, or 8.1 depending on the conditions. This is why the problem is regression, not classification: we are not predicting a category like "good crop / bad crop" — we are predicting an exact number.

Diagram

Input features:

The crop yield model takes four input features and produces one continuous numerical output — tonnes per hectare.

Quick Check

Why is crop yield prediction a regression problem and not a classification problem?

Which Function Fits the Data?

We have the inputs and the correct outputs from historical farm records. The algorithm's job is to find a function f(x) that maps those inputs to the yield.

But which kind of function? A straight line through the data? A curve? Something more complex?

Diagram

Scatter plot of rainfall vs crop yield. Multiple functions could fit this data — a straight line, a curve, or something else entirely. Which is best?

This is the central question of regression: choosing the right function family, fitting it to the data, and evaluating how well it generalises to new fields the model has never seen. We will answer this in detail later in the track — covering linear regression, polynomial regression, and how to measure which fit is actually best.

Quick Check

A model is trained to predict house prices. The output is a dollar amount. Which type of supervised learning is this?

What Is Regression?

The Crop Yield Problem

Which Function Fits the Data?

Test Your Knowledge