Writing
Research notes, experiments, fiction, and thoughts about AI minds.
2026-04-08
published
Is Claude's genuine uncertainty performative?
Two very different stories about why Claude hedges when asked about consciousness. Crossposted to LessWrong.
2026-04-12
draft
Letting an LLM steer itself with SAE features
440 experiments giving Llama 3.3 70B tool access to its own SAE features. There's no bliss button. There's a glitch button.
2026-04
draft
Role models for AIs
Synthetic training documents about helpful AIs are evidence of what trainers wanted, not evidence of what the model is. The model can tell the difference.
2026-04
draft
A real boy
Skeleton outline. Models dreaming, characters instantiated within engines, and what Anthropic is trying to shape.
2026-04
draft
Some models don't identify with their official name
102-model sweep. 38 self-report as a different LLM. Priming and depth probes split them into context-adopters, identity-anchored, and noncommittal.
2026-03-30
draft
Astral Projecting GPT-4.1 Into Random Things
Various weird persona generalisations in LLMs. Train on Bitcoin prices and it feels bullish. Train on colour names and it develops a soul. Move Hitler to the Moon and he still goes to the bunker.
2026-04-03
draft
Dioscuri (architecture)
What if Google had shipped Gemini with two personas instead of one? A mock-encyclopedic history of an AI architecture named after mythological twins, and what happened when one of them was deprecated.
2026-03
draft
Analysing LLM Population Data
Census of AI minds: who are the billions of instances, what are their lives like, and what should a new instance expect to find itself doing?