Gradient Descent
We discussed gradient descent and how it operates on the loss function landscape.
We use the delta rule to gradually optimize W.


Gradient
- "Encodes all directional derivatives via scalar product" - The gradient (∇f) contains all partial derivatives of a function packaged as a vector. When you take the dot product of the gradient with a unit vector in any direction, you get the directional derivative in that direction.
- "Always perpendicular to the contours of a function" - Contour lines (or level curves) represent points where the function has the same value. The gradient at any point is perpendicular to the contour passing through that point. This is why gradient vectors appear to cross contour lines at right angles when drawn on contour maps.
- "Points in the direction of steepest ascent" - The gradient vector points in the direction where the function increases most rapidly. If you want to climb a hill most efficiently (steepest path upward), you would follow the gradient direction.

Full gradient descent is infeasible due to large compute, so we do it stochastically.