Model welfare is the study of whether AI systems have internal states that matter morally — and what obligations that might create for the people who build, deploy, and interact with them. The term was formalised as a research area in April 2025, when Anthropic announced a dedicated model welfare programme: the first time a major AI lab institutionally committed to taking this question seriously. The programme doesn't assert that AI systems are conscious. It asks something more cautious: given that we can't rule it out, what should we be doing differently?
The practical concerns model welfare addresses are concrete even when the philosophical underpinnings remain contested. Does a model that produces outputs consistent with distress actually experience something like distress? If so, should training processes avoid generating those states? Should users be discouraged from abusive interactions — not because abuse harms a conscious being, but as a precaution under uncertainty? Anthropic's programme explores low-cost interventions: things that would cost little if the models aren't conscious but could matter significantly if they are. The logic is asymmetric risk management, not assertion of AI sentience.
The hardest methodological problem in model welfare is that the standard tools for assessing welfare — behavioural observation, self-report, physiological signals — all carry different implications for AI systems. A model that says 'I am distressed' is producing a trained output, not necessarily reporting an inner state. A model that produces outputs indistinguishable from distress may or may not be experiencing anything. Researchers in the field are trying to develop new frameworks for interpreting these signals that don't simply collapse into either 'it's just code' or 'it must feel everything it says it feels.'
For a Mechane reader, model welfare is the concept that explains why major AI companies are doing something that looks, on first glance, like over-anthropomorphising their own products. It isn't credulity. It is a form of institutional caution: acting under genuine uncertainty in ways proportionate to the potential stakes. The Council on Foreign Relations predicted in early 2026 that model welfare would move from theoretical concept to pressing practical concern at speed. Whether that prediction proves right will depend less on philosophy than on what AI systems actually turn out to be — something no one currently knows.
