+15 XP

Learning by Following the Slope

Gradient descent finds the minimum of a loss function by taking small steps in the direction opposite to the gradient (slope).

θ ← θ − α · ∂L/∂θ

θ = parameter, α = learning rate, ∂L/∂θ = gradient of loss.

python
import numpy as np

# Fit y = w*x using gradient descent
x = np.array([1., 2., 3., 4., 5.])
y = np.array([2., 4., 6., 8., 10.])  # true: w=2

w = 0.0          # start with w=0
alpha = 0.01     # learning rate

for step in range(100):
    y_pred = w * x
    loss = np.mean((y - y_pred)**2)
    grad = -2 * np.mean((y - y_pred) * x)
    w -= alpha * grad

print(f'w = {w:.4f}')  # ≈ 2.0

100 steps of gradient descent learning w=2.

The learning rate α is crucial. Too large → overshoots the minimum, oscillates or diverges. Too small → learns too slowly. NMA tutorials explore this tradeoff directly.

🦌 Ilya says: Every deep learning model you'll encounter at NMA uses gradient descent. Mastering this is mastering modern ML.