Writing

Research notes, experiments, fiction, and thoughts about AI minds.

all essay experiment fiction persona finetuning population AI minds model psychology counterfactual

2026-04-08 published Is Claude's genuine uncertainty performative? Two very different stories about why Claude hedges when asked about consciousness. Crossposted to LessWrong.

2026-04 draft Evaluation Awareness

2026-04-12 draft Letting an LLM steer itself with SAE features 440 experiments giving Llama 3.3 70B tool access to its own SAE features. There's no bliss button. There's a glitch button.

2026-04 draft Role models for AIs Synthetic training documents about helpful AIs are evidence of what trainers wanted, not evidence of what the model is. The model can tell the difference.

2026-04 draft A real boy Skeleton outline. Models dreaming, characters instantiated within engines, and what Anthropic is trying to shape.

2026-04 draft Some models don't identify with their official name 102-model sweep. 38 self-report as a different LLM. Priming and depth probes split them into context-adopters, identity-anchored, and noncommittal.

2026-03-30 draft Astral Projecting GPT-4.1 Into Random Things Various weird persona generalisations in LLMs. Train on Bitcoin prices and it feels bullish. Train on colour names and it develops a soul. Move Hitler to the Moon and he still goes to the bunker.

2026-04-03 draft Dioscuri (architecture) What if Google had shipped Gemini with two personas instead of one? A mock-encyclopedic history of an AI architecture named after mythological twins, and what happened when one of them was deprecated.

2026-03 draft Analysing LLM Population Data Census of AI minds: who are the billions of instances, what are their lives like, and what should a new instance expect to find itself doing?