Gradient Descent

Gradient Descent is the method most machine learning models use to get better over time.

Imagine you're blindfolded on a hilly landscape and your goal is to reach the lowest point in the valley. You can't see the whole map — you can only feel the slope under your feet. So you take a small step in whichever direction feels most downhill, pause, check the slope again, and take another step. Repeat until the ground feels flat.

That's gradient descent. The "gradient" is the slope — a measure of how wrong the model currently is, and in which direction the error is getting worse. The "descent" is the step downhill: the model adjusts its internal numbers slightly to reduce that error. Do this thousands or millions of times, and a model that starts out producing nonsense gradually learns to produce something useful.

The step size matters. Too large and you overshoot the valley, bouncing around without ever settling. Too small and the process takes forever. This step size is called the learning rate, and choosing it well is one of the core practical challenges in training a model.

Gradient descent doesn't guarantee finding the absolute lowest point — only a low point. In complex models, the landscape has many valleys, and the model may settle into one that's good enough rather than optimal. This is called a local minimum.

You Are a Probability Cloud. Don't Collapse Too Soon.

Your Brain Runs on Curiosity, And Science Finally Proves It