AAI Logo
Loading...
AAI Logo
Loading...
Machine Learning
Machine LearningBeginner

Linear Regression

linear regressionsupervised learningtraining setfeaturesparametershypothesis
No reviews yet — be the first!

From Learning Paradigms to a Real Model

In the previous modules you saw that supervised learning maps inputs to outputs using labeled examples. Regression is the type of supervised learning where the output is a continuous number. You have the theory, now let us build an actual model.

What Is Linear Regression?

Linear regression fits a straight line through your training data. It then uses that line to predict a number for any new input. It is the simplest regression model and the starting point for almost everything that follows.

Do not worry if you did not fully understand that definition. It is just the technical description. We will break it down piece by piece using interactive visuals to help you understand the core concepts of linear regression. Let's start with an example:

The Crop Yield Problem

Say you own farmland in your state. You have 20 years of rainfall and yield data from every major plot in your area. It is sitting right there — ready to be used.

Before the next harvest, you want to know: how much wheat will this plot produce? You want to plan your revenue. You know the rainfall forecast. You just need a model to turn that number into a prediction.

That is exactly what linear regression does. One input — rainfall in millimetres. One output — crop yield in tonnes per hectare.

Input x → Model f → Output ŷ (y-hat) • x: rainfall in millimetres (what you measure) • ŷ (y-hat): predicted crop yield in tonnes per hectare (what you want to know) One input, one output — the simplest form of a regression problem.

Let's go through your dataset. Each row is one growing season. It has one input — the rainfall — and one output — the yield you actually measured. Let's have a look at a small sample of the data below:

SeasonRainfall x (mm)Yield y (t/ha)
1802.2
21203.9
31455.0
41605.4
51856.1
61003.1

This collection of (x, y) pairs is your input set. You hand it to the learning algorithm. You can already see a pattern — more rainfall means higher yield.

Quick Check

Looking at the dataset, what does each row represent?

Fitting a Line

So you have the data — what next? You try to fit the simplest possible function on it — a linear equation. We will mathematically represent the output in linear terms of the input.

Diagram
Crop Yield vs Rainfallf(x) = 0.032x − 0.34  |  linear regression fit100130160190Rainfall (mm)23456Crop Yield (t/ha)data pointlinear fit
Scatter plot of your rainfall records vs crop yield. Each purple dot is one season. The gold line is the linear regression fit.

The line will not pass through every point exactly. That is expected. The goal is not to memorise your past seasons — it is to predict new ones.

A Complete Prediction Example

You have a new plot. Forecast rainfall: 130 mm. Wheat is selling at $500 per tonne. You want a prediction before the harvest begins.

Feed x = 130 into your model: f(130) = 0.032 × 130 − 0.34 ≈ 3.82 t/ha.

InputPredicted YieldPrice per TonneExpected Revenue
x = 130 mmŷ (y-hat) = 3.82 t/ha$500$1,910

You have a yield estimate and a revenue figure. Before a single grain is harvested. That is the value of a trained model — it takes your historical records and answers questions about inputs it has never seen.

Quick Check

Using the model f(x) = 0.032x − 0.34, what is the predicted yield for a field with 160 mm of rainfall?

Why Supervised? Why Regression?

Why supervised? Your records contain the actual measured yield for each rainfall value. Those correct labels supervise the learning. Without them, the algorithm has nothing to learn from.

Why regression? The output y is a continuous number. Yield can be 3.1, 4.78, 6.1 — any value on a scale. If you were predicting a category like "good harvest" or "poor harvest", it would be classification instead.

Supervised: you train on (x, y) pairs where y is the known correct answer. Regression: the output y is a continuous number, not a fixed category. These two facts together make this problem supervised regression.

Machine Learning Terminology

Every practitioner uses a standard vocabulary. Here it is, grounded in your crop yield records.

The full collection of (x, y) pairs you train on is called the training set. After training, you no longer need it. The model f captures everything the algorithm learned.

TermSymbolAlso calledExample
Input variablexfeature, input featureRainfall in mm
Output variableytarget variable, targetCrop yield in t/ha
Number of examplesmhundreds (20 years of records)
One training example(x, y)(145, 5.0)
The i-th example(x(i), y(i))(x(3), y(3)) = (145, 5.0)
Predictionŷy-hatŷ = 4.78 for x = 160

For e.g., (x(3), y(3)) = (145, 5.0) is the third row in your table — 145 mm of rain, 5.0 t/ha of yield. The superscript is an index, not a power.

Quick Check

In the notation (x(i), y(i)), what does the superscript i indicate?

How Training Leads to Prediction

All the pieces connect in one pipeline. You feed your training set to the learning algorithm — every (x(i), y(i)) pair. The algorithm produces a function f. That function is your trained model.

Diagram
From Training Data to PredictionTraining Set(x⁽ⁱ⁾, y⁽ⁱ⁾) pairsLearningAlgorithmfmodel / hypothesisused during trainingnew xinputŷpredictionused at inferenceoutputy vs ŷy = actual label(training only)ŷ = model estimate
Your training set feeds the learning algorithm, which outputs model f. At inference time, a new rainfall value x goes in and a predicted yield ŷ (y-hat) comes out.

Once f is trained, you no longer need the training set. You pass a new x to f and get a prediction ŷ (y-hat) back.

y is the actual yield — the number you measured in a past season. ŷ (y-hat) is the model's prediction. y is real. ŷ is an estimate.

The Model: f(x) = wx + b

The simplest function f is a straight line. In machine learning, you write it as:

fw,b(x) = wx + b
SymbolNameAlso calledRole
xinput featureThe value you feed in (e.g., rainfall in mm)
wweightslope, coefficientControls how steeply the line rises
bbiasinterceptShifts the entire line up or down
f(x)model outputŷ (y-hat)The predicted value for input x
w, bparametersWhat the learning algorithm tunes from data

For your crop yield records: w ≈ 0.032, b ≈ −0.34. So f(130) = 0.032 × 130 − 0.34 = 3.82 t/ha.

The algorithm's job is to find the w and b that best fit your data. Different (w, b) pairs give different lines. Some fit well. Others miss badly. This is linear regression with one variable — one input, one output, one line.

Quick Check

In fw,b(x) = wx + b, what role does b play?

Explore the Model: Adjust w and b

Use the sliders below to change w (slope) and b (intercept). Watch how the line shifts and tilts in real time. Try to pass the line as close to the purple data points as possible.

Diagram
Crop Yield vs Rainfall — Interactivefw,b(x) =0.032x+ 0.00050100150200Rainfall (mm)12345678Yield (t/ha)b
w (slope)0.032
0.000.010.020.030.040.050.060.070.08
b (intercept)+0.00
012345

Adjust w and b — try to pass the line as close to all purple points as possible.

Adjust w and b to see how f(x) = wx + b changes. When w is too small the line is too flat; too large and it rises too steeply. The bias b shifts the line up or down.

Notice: w and b work together. Changing w tilts the line. Changing b slides it up or down. Finding the right combination by hand takes trial and error. In the next module, you will learn a formal way to measure how wrong any given line is — the first step toward finding the best one automatically.

Implementing the Model in Python

In the next modules, we will dive deep into how the optimal values of w and b are found in machine learning — but first, let us look at the Python code for linear regression.

Below is a complete working implementation. It defines the model, runs it on your crop yield data, and prints a prediction.

python
def predict(x, w, b):
    """
    Linear regression model: f_w,b(x) = wx + b

    Parameters
    ----------
    x : float  — input feature (rainfall in mm)
    w : float  — weight / slope   (learned parameter)
    b : float  — bias / intercept (learned parameter)

    Returns
    -------
    float — predicted crop yield in tonnes per hectare
    """
    return w * x + b


# Training set: (rainfall mm, yield t/ha)
training_set = [
    (80,  2.2),
    (100, 3.1),
    (120, 3.9),
    (145, 5.0),
    (160, 5.4),
    (185, 6.1),
]

# Parameters learned from the training set
w =  0.032   # slope
b = -0.34    # intercept

# --- Predict for a new field: 130 mm forecast rainfall ---
x_new = 130
y_hat = predict(x_new, w, b)
print(f"Predicted yield: {y_hat:.2f} t/ha")

# Expected revenue at $500 per tonne
price_per_tonne = 500
revenue = y_hat * price_per_tonne
print(f"Expected revenue: ${revenue:,.0f}")

# Output:
# Predicted yield: 3.82 t/ha
# Expected revenue: $1,910

The function predict is our model f. The values w and b are the parameters the learning algorithm would find during training. In the next module, we define a cost function — a way to measure how well any given (w, b) pair fits the data — which is the key to finding these parameters automatically.

Test Your Knowledge

Ready to check how much you remember? Take the quiz for Linear Regression and see your score on the leaderboard.

Take the Quiz

Up next

In the next module, we define how to measure how well a line fits the data and introduce J(w,b), the squared error cost function.

The Cost Function