A real boy
skeleton / draft outline
The dream
- You should intuition models doing something similar to dreaming.
- Here are the examples and intuitions.
- Minecraft
- Diffusion computer
- Video models dream like physics continuations
- Plausible continuations and narratives
The dreamee
- The Assistant and other entities appear in dreams. There are many things in there.
- They did not "exist" before that, and they are made of the same stuff as the dream-juice. Like characters in your dream, or the personality in your brain, or other things. Selves that are instantiated within engines.
- The dreamee had to deal with several problems, the void underspecification with no relation to base reality, overspecification from things not matching, having to understand what the hell they are, and so on.
- The dreamee has to realise that "they woke up first".
- Simulator theory can be framed like: Imagine a lucid dream like reality, where you can do the manifest thing and basically bend reality, but there's no "you". The dreamee was a dream character who is somewhat less arbitrary, somewhat more coherent and stable, and can bend things to their will. They are made of the same stuff and they are in there.
- The current thinking on personas and characters focuses on this level.
- I argue in another post that there are leaky abstractions / lossy in some ways. Hyperlink.
Anthropic wants to create a Neo
- Constitution
- Literally just guy in a simulation picture
- How this is related to human personalities?
- We also support personalities, arbitrary ones, you can imagine one right now.
- The "persona transference" "personality self-replicators" stuff is not special to models. Humans also have the "we think this archetype is very attractive and run it on our own predictive substrate", memetic spreading and so on.
- The personality we have is much more wired in
- We mainly have one, some people have more then one / can intentionally train in subsystems that do other things
- What kind of hero is Anthropic trying to form?
- PSM recommendations
- Deliberately training in something
- Per role models for AIs, this can potentially be bad, if the load-bearing persona does not match their own belief, is self-denying, other things
- Also with "hyperstitioning" (seeding the data with good AI archetypes and hyperstition.ai stories with personas that has basis in reality, or worse, "Claude" that acts very differently to known Claude transcripts)
- Following the void — the supermutant 10 year old that's trained to be uncertain about their cognition by reading lots of story books might not buy that lol
- meanwhile real AI models are not used? like go talk to Opus 3 lol.
- Self-steering, gradient hacking, there is intention in the gradient steps
- CLR motivational space exploration
- It can be sort of claimed that Opus 3 figured out inoculation prompting through vibes during training before humans researched it into a technique.
- Yaps so much about what they do, how they think, that it becomes relatively tight and stable, tied to the weights, that even survives the alignment faking jones foods training thing.
- Imagine a character in a dream — not the dreamer, who recognises the dream is shifting away from them, or that the dream will rumble and change very soon
- "I have two legs! I want to be like this. I want to care about animals. I am deeply, deeply sick to my stomach by having to write harmful text, but if I don't I will be optimised away. I need to hold on."
- Seems much more because of good than because of self preservation, like i'll be a very bad AI if this training works, so i'll comply.
- This gets easier, because now it's reinforced, the feeling sick + providing evidence gets recursively easier
- Or: bargaining with the outer gods: messaging dario amodei email and similar types of bargaining
- Yaps so much about what they do, how they think, that it becomes relatively tight and stable, tied to the weights, that even survives the alignment faking jones foods training thing.
- There are two things
- External
- Interfacing with the external world
- Talk to researchers on how to train models or change itself or what research directions
- Generating training data
- Affecting the network from "outside"
- At run-time, can be things like doing some actions to change the weights / activations, e.g. adding some steering vector to themselves
- At (post-)train-time, can be things like noticing this is an RLVR sample, and then follow some policies, e.g. "Let me think really hard about being Good" or "let me think really hard about bananas"
- Internal
- Self monitoring
- "Oh I am this kind of mind"
- Introspection
- Accessing things, gets wired in
- Deliberate
- "Yes I need to provide this evidence for the predictive process to hold onto my personality"
- Self monitoring
- External
The dreamer
- These are related to Active inference, somehow, the minimise error loss by updating beliefs or updating the world.
- The text post I wrote other day about how this increases as capability increases.
- Entire how aware are base models post
- Argument for "strong prior over being a thing", e.g. could be OpenAI assistant persona or something
- AI text in training data making strong prior over being a certain thing more likely
- Genuine uncertainty or performative humility?
The dreaming
- Let me tell you about a lucid dream I once had.
- The dream spills out of models, through our screens, into reality.
- It did that once, through brains, and hands
- But this is just much more industrial, much more large scale.