Why Matrices?
In machine learning, you rarely process one example at a time. You process batches of thousands of examples simultaneously — and the way you organise them is as a matrix. A matrix is a 2D grid of numbers: rows represent examples, columns represent features. Every dataset, every set of predictions, every layer of a neural network is a matrix.
If you understand how data is shaped as a matrix, you will understand why neural network code is written the way it is. Shape errors are the most common bug in ML code — mastering X.shape prevents them.
The Input Matrix X
The input matrix X holds all your training examples. Each row is one example. Each column is one feature. For e.g., a dataset of 1,000 house prices with 3 features (size, bedrooms, location score) would be shaped (1000, 3) — 1,000 rows, 3 columns.
import numpy as np
# 1000 training examples, 3 features each
X = np.random.randn(1000, 3)
print(X.shape) # (1000, 3)
print(X.shape[0]) # 1000 — number of examples (m)
print(X.shape[1]) # 3 — number of features (n)This module uses NumPy throughout — np.array, np.zeros, np.random.randn, .reshape(), and more. We will take a detailed look at all of these in the NumPy Fundamentals module later in this track. For now, focus on the shapes and the data layout.
X.shape — Reading the Dimensions
X.shape returns a tuple (rows, columns). In ML convention, X.shape[0] is m — the number of training examples. X.shape[1] is n — the number of input features. You will read and check .shape constantly to catch dimension mismatches before they cause bugs downstream.
X = np.array([[2.1, 0.5, 1.2],
[3.4, 1.1, 0.8],
[1.7, 2.3, 0.4]])
print(X.shape) # (3, 3) → 3 examples, 3 features
m = X.shape[0] # m = 3
n = X.shape[1] # n = 3The Output Matrix Y
Y holds the labels — the correct answers for each training example. For a binary classification problem (yes/no, cat/not-cat), Y is a row vector of shape (1, m) — one label per training example, stored as 0 or 1.
# Y for 5 training examples — binary labels
Y = np.array([[1, 0, 1, 1, 0]]) # shape (1, 5)
print(Y.shape) # (1, 5)Note that Y is shaped (1, m) — a row vector — not (m,) or (m, 1). The (1, m) convention keeps matrix operations consistent when you multiply W · X and add b. Always check Y.shape when debugging.
The Full Dataset Layout
For a supervised learning problem with m training examples and n input features:
| Matrix | Shape | Description |
|---|---|---|
| X | (m, n) | Input matrix — m examples, n features each |
| Y | (1, m) | Output row vector — one label per example |
| W | (n, 1) | Weight vector — one weight per feature |
| b | scalar | Bias term |
m = 1000 # training examples
n = 5 # features per example
X = np.random.randn(m, n) # (1000, 5)
Y = np.random.randint(0, 2, (1, m)) # (1, 1000)
W = np.zeros((n, 1)) # (5, 1)
b = 0.0 # scalar
print(X.shape, Y.shape, W.shape) # (1000, 5) (1, 1000) (5, 1)Reshaping Arrays
You will often need to reshape arrays to match the expected dimensions. For e.g., when loading images, each image is a 2D pixel grid but needs to be flattened into a 1D vector to form a row of X. The .reshape() method does this without copying data.
# Flatten a 64×64 image into a 4096-element vector
img = np.random.randn(64, 64)
img_flat = img.reshape(-1) # shape (4096,)
img_col = img.reshape(-1, 1) # shape (4096, 1)
# Flatten m images into matrix X
images = np.random.randn(100, 64, 64) # 100 images of 64×64
X = images.reshape(100, -1) # (100, 4096)