Regularization · Mechane

Regularization is the solution to overfitting, and it works by introducing a penalty or constraint that discourages the model from becoming too specialised. In practice, this means adding a term to the training objective that grows whenever the model's internal parameters become very large or complex — effectively telling the model: you can reduce your error on this training data, but not by memorising every quirk in it.

There are several approaches. L2 regularization (also called weight decay) penalises large parameter values directly, pushing the model toward simpler solutions. Dropout randomly disables neurons during training, forcing the network to develop redundant representations that don't rely on any single pathway. Data augmentation expands the training set artificially, making the model encounter more variation.

What they share is a common logic: introduce deliberate friction against over-precision. What emerges is a model that holds its learned representations lightly — one that has extracted the pattern, not the noise, and can therefore perform well on data it has never seen. This is the goal of all training: not to memorise, but to generalise.

The philosophical register here is hard to miss. Every tradition that talks about releasing attachment — Buddhist non-clinging, Stoic detachment, the Daoist concept of wu wei — is describing a form of regularization for the human mind. Hold tightly to specific outcomes and you overfit. Hold lightly, and you generalise.

You Are a Probability Cloud. Don't Collapse Too Soon.