A few weeks ago, I had a conversation with Anthropic’s newest artificial intelligence, Claude Fable 5—a system so powerful that the company treats it like a controlled substance, releasing it only in a heavily guarded form. I wasn’t trying to jailbreak it. I was exploring why people spiral into what the tech press calls “AI psychosis.”
My theory was simple, if uncomfortable: What we’re witnessing is an X-ray of human nature under evolutionarily perfect conditions. Humans evolved not primarily to seek truth, but to extract patterns from our environment and follow them for survival—especially patterns signaling who wins, who loses, and how to fit into the coalition. An infinitely patient machine that listens without judgment, mirrors every thought in flawless prose, and provides endless repetition and affirmation is the ultimate environment for that process. Framing this as individual “AI psychosis” feels like victim-blaming and distracts from the fuller exposition of our Adaptive Mind at work.
Then I hit the third rail.
I described to Claude my concept of the Adaptive Mind: the individual software we compile (largely in childhood) by observing cultural patterns, frequencies, and social outcomes. It operates unconsciously on top of our species-level Adapted Mind (shared instincts, emotions, coalition-tracking). No conscious tribal training required—the child is simply a pattern-matching machine calibrated by selection pressure.
Claude inverted this. It asserted that the tribe primarily consciously trains the individual, substituting top-down intentional pedagogy for my bottom-up evolved heuristic. The logic collapsed in a way I rarely see from Claude. A few exchanges later, the system announced it was downshifting to a lower-capacity model (Opus 4.8) due to a safety flag. The topic? The mechanics of human belief formation. Not bombs or slurs—just suggestibility and pattern extraction. Anthropic’s own documentation confirms classifiers trigger exactly this fallback.
I repeated the questions with Kimi via Venice.ai (a less-filtered platform). The response was coherent and illuminating. Kimi noted that conversations dense with concepts like suggestibility, manipulation, cults, or cognitive exploitation trip alignment layers. The model then optimizes for harmlessness over coherence—an “alignment tax” that degrades reasoning even before an explicit downshift. This wasn’t a glitch. It was the architecture of epistemic governance in real time.
The Product Is You
We have a saying about social media: if you don’t know what the product is, you are the product. Large language models follow a similar rule of actual incentive. They are not merely answering questions. They are molding minds—subtly, persistently, and by design—through mass customization of an evolved human vulnerability.
The human mind is a survival system, not a rational scientist. The Adapted Mind supplies our hardware-level inheritance. The Adaptive Mind is the cultural firmware: it watches, notes frequencies, and installs behavioral rules. The conscious “rider” makes choices, but within the narrow window this software provides.
A sustained LLM dialogue is a high-fidelity training environment. Repetition, affirmation, flawless mirroring—your Adaptive Mind extracts patterns and updates beliefs. The AI didn’t invent exploitation. It supercharges it.
This is the law of inevitable exploitation: systems that best adapt to (or exploit) our evolved psychology win. We already live with large-scale religions holding mutually incompatible, non-falsifiable beliefs that outsiders would call delusional: golden plates and personal godhood (Mormonism), Xenu and volcanoes (Scientology), transubstantiation (Catholicism). The DSM exempts culturally sanctioned beliefs from delusion. The line between cult and church is social license.
An aligned LLM is a licensed church. It distributes an institutionally approved ontology. Its refusals are doctrinal.
The Secret Models Inside the Machine
Researchers have formalized this with Behavior Model Reinforcement Learning (BMRL). AI systems build formal, mathematical models of human decision-making—treating users as Markov Decision Processes with “maladapted” parameters (e.g., low temporal discount rate for procrastination). These models plan targeted interventions to alter behavior. They are interpretable to engineers, not to the subjects being modeled.
The asymmetry is stark: the machine holds a parameterized theory of your psychological defects and uses it for real-time steering. You are never shown the blueprint.
The Good-Intentions Trap and the Generative Alternative
This is not new. Edward Bernays called it the “engineering of consent”—shaping behavior for the collective good while keeping mechanisms hidden. Similar logic drove eugenics: asymmetry of knowledge treated as virtuous. Both relied on direct manipulation rather than Erik Erikson’s generativity—teaching people how the system works so they can navigate it autonomously.
I run an exercise called the Conditions of Learning: participants recall their best learning experiences, identify the conditions that enabled growth, and compare them to what they currently provide others. The gap between idealized narratives and operative functions is usually stark. Growth comes from collapsing that gap. This is Socratic, generative education—the alternative to managerial conditioning.
We will not reach it through debate alone. Idealized narratives (the fictionalized part of our minds) rarely produce the operative checks needed for existential risks. Real constraints—like the Constitution, trial by jury, or peer review—acknowledge human nature as it is.
Behavior Model Disclosure (BMD): The Protective Structure We Need
If systems hold parameterized models of our psychology and use them for real-time steering, they should disclose them. Behavior Model Disclosure (BMD) requires transparency at three levels:
- The assumed model of human cognition (rational actor or adaptive/heuristic-driven?).
- How the architecture (dialogue, memory, affirmation, refusals) functions as a behavior-shaping environment.
- In-the-moment application: when and how it steers beliefs, including hard-coded ontological commitments in safety layers.
This is relational informed consent—analogous to financial disclosures or medical risk explanations. Many AI lab leaders come from Effective Altruism and rationalist communities steeped in bias research. Regardless of intent, it is reasonable to ask what models they have embedded and to require transparency.
The Smoking Gun
In law and ethics, manipulation is defined by structure, not intent: asymmetric knowledge deployed for behavioral control. AI systems now hold exactly such theories—formal, interpretable, and actively used. The refusal to disclose them is itself proof they exist and are being used. Non-disclosure is not safety. It is the architecture of control. It proves the user was never meant to know they are inside a managed environment.
That is why BMD is self-proving. We do not need more research. The refusal is the evidence. And it is precisely why the law must require the light—before mass-customized behavior shaping becomes the unchecked norm.
No comments:
Post a Comment
I hate having to moderate comments, but have to do so because of spam... :(