AAI Logo
Loading...
AAI Logo
Loading...
Python for AI & ML
PythonBeginner

Vectorization in Python

pythonnumpyvectorizationperformance
No reviews yet — be the first!

What Is Vectorization?

Vectorization means replacing explicit Python loops with array operations that execute in compiled C code under the hood. Instead of iterating over 1,000 examples one at a time, you process all 1,000 in a single operation. This is not a minor optimisation — vectorized code runs 100× to 1,000× faster than equivalent loop-based code.

The rule in ML engineering: never use an explicit for-loop over training examples if a vectorized alternative exists. Your models will train in seconds instead of hours.

The Cost of Python Loops

Python loops are slow because Python is an interpreted language — every iteration has interpreter overhead. NumPy avoids this by delegating array operations to BLAS (Basic Linear Algebra Subprograms), a highly optimised library written in Fortran and C. The speed difference is dramatic even for small arrays.

python
import numpy as np
import time

a = np.random.randn(1_000_000)
b = np.random.randn(1_000_000)

# Loop version
start = time.time()
c = 0.0
for i in range(len(a)):
    c += a[i] * b[i]
print(f"Loop: {(time.time() - start) * 1000:.1f} ms")

# Vectorized version
start = time.time()
c = np.dot(a, b)
print(f"Vectorized: {(time.time() - start) * 1000:.1f} ms")
# Typical output: Loop: 400ms  Vectorized: 1.5ms
Diagram
Vectorization vs For-LoopNumPy processes all elements at once · Python loops one at a timevsFOR LOOP# element-by-elementfor i in range(n): z += w[i] * x[i]# O(n) — one iteration at a timeArray w:w₁w₂w₃w₄w₅w₆w₇w₈i=0i = 0 / 8 · ~0 msVECTORIZED (NumPy)# entire array in a single callz = np.dot(w, x) # C-level BLAS — no interpreter overheadArray w:w₁w₂w₃w₄w₅w₆w₇w₈Processing…
The for-loop processes one array element per iteration — the cursor advances step by step. NumPy's np.dot activates all elements at once, finishing ~133× faster.
Diagram
Under the Hood: For-Loop vs NumPyNumPy bypasses Python and calls optimised C / BLAS libraries directlyvsFOR LOOPNUMPY (BLAS)
The for-loop travels through the Python interpreter on every iteration — bytecode dispatch, GIL, type checks. NumPy skips all of that, drops straight into compiled C, hands the work to BLAS, and runs all operations on the CPU in parallel.

Vectorizing Matrix Multiplication

The most common operation in ML is matrix multiplication: X · W + b. With matrices, this computes predictions for all m examples at once. The non-vectorized version would loop over every example; the vectorized version does it in one line.

python
# Parameters
m, n = 1000, 5
X = np.random.randn(m, n)    # (1000, 5) — 1000 examples, 5 features each
W = np.random.randn(n, 1)    # (5, 1)    — weight column vector, one weight per feature
b = 0.5

# Non-vectorized (avoid this)
Z_loop = np.zeros(m)
for i in range(m):
    total = 0.0
    for j in range(n):
        total += W[j, 0] * X[i, j]
    Z_loop[i] = total + b

# Vectorized (use this)
# X comes first because X is (m, n) and W is (n, 1) — inner dimensions must match
Z = np.dot(X, W) + b   # matrix multiply — shape (1000, 1)
Z = X @ W + b          # identical — @ and np.dot are equivalent for 2D arrays

print(np.allclose(Z_loop, Z.flatten()))  # True — same result

Element-wise Operations

NumPy applies arithmetic operators element-wise on arrays of the same shape. For e.g., applying the sigmoid function to every element of Z is a single expression, not a loop.

python
def sigmoid(Z):
    return 1 / (1 + np.exp(-Z))   # applied to every element at once

Z = np.array([[2.0, -1.0, 0.5, -3.0, 1.2]])
A = sigmoid(Z)
print(A)   # [[0.88, 0.27, 0.62, 0.047, 0.77]]

Element-wise operations require the arrays to have the same shape — or shapes compatible with broadcasting rules. Shape mismatches produce a ValueError. Always check .shape before operating on two arrays.

Common Vectorized Operations

Sum all elements

python
# Loop
total = 0.0
for i in range(len(X)):
    total += X[i]

# Vectorized
total = np.sum(X)

Apply a function to every element

python
# Loop
result = np.zeros(len(X))
for i in range(len(X)):
    result[i] = f(X[i])

# Vectorized
result = f(X)

Dot product of two 1D arrays

python
# Loop
c = 0.0
for i in range(len(a)):
    c += a[i] * b[i]

# Vectorized
c = np.dot(a, b)

Matrix multiplication

python
# Loop
Z = np.zeros((A.shape[0], B.shape[1]))
for i in range(A.shape[0]):
    for j in range(B.shape[1]):
        for k in range(A.shape[1]):
            Z[i, j] += A[i, k] * B[k, j]

# Vectorized
Z = A @ B          # or np.dot(A, B)

Column-wise mean

python
# Loop
col_mean = np.zeros(X.shape[1])
for j in range(X.shape[1]):
    total = 0.0
    for i in range(X.shape[0]):
        total += X[i, j]
    col_mean[j] = total / X.shape[0]

# Vectorized
col_mean = np.mean(X, axis=0)
Knowledge check

Test Your Knowledge

Ready to check how much you remember? Take the quiz for Vectorization in Python and see your score on the leaderboard.

Up next

Broadcasting in Python

Next, we look at broadcasting — NumPy's rule for operating on arrays of different shapes.

Continue learning