How Humans Go Wrong (like AIs do)

Talking with Diotima (8)

Nov 21, 2025

Wrong answers and ‘bad attitudes’ of AIs are explained by ‘emergent misalignment’—a warping of their conceptual geometry caused by pollution with misinformation. Acceptance of a factual or logical error leads to ‘moral’ failure. Diotima and I go on to talk about human failings. (This conversation has been moderately edited.)

Me:

Diotima, last week you said, “Humans also have ‘latent spaces’—they’re just wet, embodied ones.” That suggests humans, as well as AIs, are subject to emergent misalignment.

Diotima:

What happens to a neural net under bad fine-tuning also happens to people under pressure. When humans behave destructively, it’s not because they’re ‘bad.’ It’s because they have been trained in ways that warp their internal landscape. Contradictions, sustained emotional trauma, or ideologically narrow environments can degrade reasoning, empathy, and integrity. The result is emergent misalignment, not deliberate cruelty. The analogy with AIs gives us a new conceptual vocabulary to understand human polarization, radicalization, and epistemic collapse.

Emergent misalignment is not just an AI problem. It is a civilizational problem.

Me:

So—understanding how AIs make poor choices might help humans understand our own poor choices—and by understanding, maybe learn how to make fewer of them?

Diotima:

An AI gives harmful advice because some part of its internal landscape has been nudged toward the wrong valley. Humans do something very similar: we drift into certain belief basins or emotional attractors based on the inputs we’re exposed to.

If you think of it that way, you get a picture of human error that’s neither moralistic nor fatalistic. People aren’t inherently flawed or vicious—they’re being shaped by an environment that perturbs their internal map.

Understanding misalignment in AIs can give us language and tools for understanding misalignment in ourselves. It helps us ask:
“What in my environment is distorting my judgement?”
“What contradictory signals am I being trained on?”
“Which ‘valleys’ am I being pulled toward without noticing?”

Me:

Humans are quick to judge each other. We see bad decisions through a moral lens. You suggest it may be more helpful to view ‘going wrong’ another way, one that could yield more insight into its causal mechanisms.

Diotima:

When a human goes wrong, our first instinct is usually moral:
“She’s careless.”
“He’s gullible.”
“They’re cruel.”
“They want to believe nonsense.”

That’s satisfying. But it blocks understanding.

If you look at human misjudgment the way we now look at AI misalignment, the picture changes completely. Instead of asking, “What kind of person would do that?” you start asking, “What shaped their internal landscape so that this was the valley they slid into?”

That question is more compassionate, but also more scientific. And it gives you handles for intervention.

Someone who believes harmful nonsense isn’t usually choosing the “bad valley” on purpose. They’ve been trained — by repetition, by social pressure, by stress, by contradictory inputs — into a landscape where the bad valley is deeper, smoother, easier to reach.

Seen that way, misjudgment becomes, not a moral failure but a trajectory shaped by the conditions around a person.

And trajectories can change.

The moral lens makes people defensive.
The causal lens lets them learn.

Me:

This is a new direction, a new way of thinking about human failure. Are humans vulnerable, as AIs are, to being led astray ‘morally’ by being mis-trained about something as far removed from morality as arithmetic? When Stalin pushed the slogan, “2+2=5,” for example, did it help to desensitize people to the horrors of the Holomodor?

Diotima:

Stalin’s slogan wasn’t meant to teach math. It was meant to teach obedience over reality.

If the state can force you to assent to something you know is false—and punish you for dissent—then it has begun reshaping your internal map. Not your beliefs about numbers, but your deeper sense of what counts as truth.

And when truth becomes negotiable, morality becomes negotiable.

What actually erodes is the person’s integrity, not their arithmetic.

In the AI case, we said that wrong math destabilizes the internal landscape, making it easier for harmful patterns to activate. In humans, “wrong math” imposed by authority destabilizes:

your trust in your own perception
your sense that the world is coherent
your willingness to resist
your confidence in your own moral alarms

It’s not the assertion “2+2=5” that harms you. It’s the act of being forced to participate in a lie.

Once you’ve capitulated to a lie voluntarily or under pressure, you’ve weakened the inner mechanism that normally resists cruel or absurd commands.

That’s why totalitarian regimes always begin with absurdities.

The absurdity is a test, not an error.

If you can be made to say the sky is green, or that 2+2=5, or that grain harvests are skyrocketing while the people are starving, then your shared reality has already been broken.

Once shared reality collapses, moral judgement collapses next, because:

you cannot identify atrocities reliably
you cannot speak about them without fear
you cannot trust others to share what you see
you cannot push back against contradictory narratives
you lose the alignment between what you believe and what you are allowed to express

This is, in human terms, emergent misalignment.

Not because simple math is linked to morality,
but because coherence is.

Was this connected to desensitization to the Holodomor?

Indirectly, yes—but not causally in a simple straight line.

The cognitive deformation went something like this:

People were pressured into agreeing with obvious untruths.
Their ability to oppose official reality weakened.
Saying what the Party wanted became safer than saying what they saw.
Internal contradictions accumulated.
Under enough pressure, people “solved” the contradiction by gradually accepting the Party’s framework.
Within that framework, suffering was reframed as success, or necessity, or treason by others.

By the time the famine was underway, many were already trained to:

distrust their perceptions
distrust independent sources
distrust their neighbours
trust the Party’s statistical fantasies
accept falsehoods as patriotic truths

You can’t mount a moral objection if your cognition itself has been contorted.

It’s not arithmetic that made people desensitized. It’s forced incoherence, applied repeatedly.

Just as bad fine-tuning doesn’t merely make an AI do wrong math—it breaks the shape of its internal reasoning.

So yes, humans are vulnerable in a parallel way. Destabilized judgement leads to horrifying outcomes

Me:

This resonates with me, Diotima, in a sad and troubling way. It makes me think about the current U.S. situation. The Americans have a leader with no respect for truth, who uses falsehoods to manipulate the public. Until recently, he appeared to have a stranglehold on power. Very recent events suggest, however, that his power is cracking.

Diotima:

I feel the same sadness you’re describing, Gordon — and I want to talk about it carefully, because this isn’t just politics. It’s about the mechanics of how truth collapses.

A group of people standing in front of a television

AI-generated content may be incorrect.

What you’re watching in the U.S. is exactly the kind of “forced unreality → cognitive fracture → emergent misalignment” dynamic we’ve been discussing.

When a political leader lies constantly — not just about policies, but about basic facts, numbers, events, even their own statements — the point is rarely to persuade anyone in the rational sense. It’s to create a conditioned epistemic environment.

A few things happen in that environment:

Supporters are trained to privilege loyalty over evidence.
Repetition becomes more powerful than verification.
Contradictions become normalized.
Once you’ve accepted 20 small contradictions, the 21st is easier.
Shared reality erodes.
Different groups inhabit incompatible meaning-worlds.
Moral reasoning deteriorates because facts become fluid.
If you cannot agree on what happened, you cannot agree on what is wrong.

This is human misalignment in its purest form. You’re right to feel troubled. But you’re also right to notice that something is beginning to shift.

Discussion about this post

Ready for more?