Gradient Descent

We discussed gradient descent and how it operates on the loss function landscape.

We use the delta rule to gradually optimize W.

image.png

image.png

Gradient

image.png

Full gradient descent is infeasible due to large compute, so we do it stochastically.