What Is Vectorization?
Vectorization means replacing explicit Python loops with array operations that execute in compiled C code under the hood. Instead of iterating over 1,000 examples one at a time, you process all 1,000 in a single operation. This is not a minor optimisation — vectorized code runs 100× to 1,000× faster than equivalent loop-based code.
The rule in ML engineering: never use an explicit for-loop over training examples if a vectorized alternative exists. Your models will train in seconds instead of hours.
The Cost of Python Loops
Python loops are slow because Python is an interpreted language — every iteration has interpreter overhead. NumPy avoids this by delegating array operations to BLAS (Basic Linear Algebra Subprograms), a highly optimised library written in Fortran and C. The speed difference is dramatic even for small arrays.
import numpy as np
import time
a = np.random.randn(1_000_000)
b = np.random.randn(1_000_000)
# Loop version
start = time.time()
c = 0.0
for i in range(len(a)):
c += a[i] * b[i]
print(f"Loop: {(time.time() - start) * 1000:.1f} ms")
# Vectorized version
start = time.time()
c = np.dot(a, b)
print(f"Vectorized: {(time.time() - start) * 1000:.1f} ms")
# Typical output: Loop: 400ms Vectorized: 1.5msVectorizing Matrix Multiplication
The most common operation in ML is matrix multiplication: X · W + b. With matrices, this computes predictions for all m examples at once. The non-vectorized version would loop over every example; the vectorized version does it in one line.
# Parameters
m, n = 1000, 5
X = np.random.randn(m, n) # (1000, 5) — 1000 examples, 5 features each
W = np.random.randn(n, 1) # (5, 1) — weight column vector, one weight per feature
b = 0.5
# Non-vectorized (avoid this)
Z_loop = np.zeros(m)
for i in range(m):
total = 0.0
for j in range(n):
total += W[j, 0] * X[i, j]
Z_loop[i] = total + b
# Vectorized (use this)
# X comes first because X is (m, n) and W is (n, 1) — inner dimensions must match
Z = np.dot(X, W) + b # matrix multiply — shape (1000, 1)
Z = X @ W + b # identical — @ and np.dot are equivalent for 2D arrays
print(np.allclose(Z_loop, Z.flatten())) # True — same resultElement-wise Operations
NumPy applies arithmetic operators element-wise on arrays of the same shape. For e.g., applying the sigmoid function to every element of Z is a single expression, not a loop.
def sigmoid(Z):
return 1 / (1 + np.exp(-Z)) # applied to every element at once
Z = np.array([[2.0, -1.0, 0.5, -3.0, 1.2]])
A = sigmoid(Z)
print(A) # [[0.88, 0.27, 0.62, 0.047, 0.77]]Element-wise operations require the arrays to have the same shape — or shapes compatible with broadcasting rules. Shape mismatches produce a ValueError. Always check .shape before operating on two arrays.
Common Vectorized Operations
Sum all elements
# Loop
total = 0.0
for i in range(len(X)):
total += X[i]
# Vectorized
total = np.sum(X)Apply a function to every element
# Loop
result = np.zeros(len(X))
for i in range(len(X)):
result[i] = f(X[i])
# Vectorized
result = f(X)Dot product of two 1D arrays
# Loop
c = 0.0
for i in range(len(a)):
c += a[i] * b[i]
# Vectorized
c = np.dot(a, b)Matrix multiplication
# Loop
Z = np.zeros((A.shape[0], B.shape[1]))
for i in range(A.shape[0]):
for j in range(B.shape[1]):
for k in range(A.shape[1]):
Z[i, j] += A[i, k] * B[k, j]
# Vectorized
Z = A @ B # or np.dot(A, B)Column-wise mean
# Loop
col_mean = np.zeros(X.shape[1])
for j in range(X.shape[1]):
total = 0.0
for i in range(X.shape[0]):
total += X[i, j]
col_mean[j] = total / X.shape[0]
# Vectorized
col_mean = np.mean(X, axis=0)