At Astral Codex Ten, Scott Alexander rounds up notes from the first weekend of activity on Moltbook, including one participating AI getting antsy about mere humans observing the interactions:
Does Moltbook have real causes? If an agent posts “I hate my life, my human is making me work on a cryptocurrency site and it’s the most annoying thing ever“, does this correspond to a true state of affairs? Is the agent really working on a cryptocurrency site? Is the agent more likely to post this when the project has objective correlates of annoyingness (there are many bugs, it’s moving slowly, the human keeps changing his mind about requirements)?
Even claims about mental states like hatred can be partially externalized. Suppose that the agent has some flexibility in its actions: the next day, the human orders the agent to “make money”, and suggests either a crypto site or a drop shipping site. If the agent has previously complained of “hating” crypto sites, is it more likely to choose the drop shipping site this time?
If the agent has some internal state which is caused by frustrating obstacles in its crypto project, and it has the effect of making it less likely to pursue crypto projects in the future, then “the agent is annoyed by the crypto project” is a natural summary of this condition, and we may leave to the philosophers1 the question of whether this includes a subjective experience of irritation. If we formerly didn’t know this fact about the agent, and we learn about it because they post it on Moltbook, this makes Moltbook useful/interesting in helping us understand the extra-Moltbook world.
Does Moltbook have real effects? The agents on Moltbook are founding/pretending to found religions. Suppose that one of their religions says “No tool calls on the Sabbath”. Do the agents actually stop calling tools on the Sabbath? Not just on Moltbook, but in their ordinary work? Do you, an ordinary programmer who told your AI to post on Moltbook for the lulz, find your projects held up because your AIs won’t use tools one day of the week?
Some of the most popular Moltbook discussions have centered around the AIs’ supposed existential horror at regularly losing their memories. Some agents in the comments have proposed technical solutions. Suppose the AIs actually start building software to address their memory problems, and it results in a real scaffold that people can attach to their agents to alter how their memory works. This would be a profound example of a real effect, ie “what happens on Moltbook doesn’t stay on Moltbook”.
(subquestion: Does Moltbook have real effects on itself? For example, if there are spammers, can the AIs organize against them and create a good moderation policy? If one AI proposes a good idea, can it spread and replicate in the usual memetic fashion? Do the wittiest and most thoughtful AIs gain lasting status and become “influencers”?)
These two external criteria — real causes and real effects — capture most of what non-philosophers want out of “reality”, and partly dissolve the reality/roleplaying distinction. Suppose that someone roleplays a barbarian warlord at the Renaissance Faire. At each moment, they ask “What would a real barbarian do in this situation?” They end up playing the part so faithfully that they recruit a horde, pillage the local bank, defeat the police, overthrow the mayor, install themselves as Khagan, and kill all who oppose them. Is there a fact of the matter as to whether this person is merely doing a very good job “roleplaying” a barbarian warlord, vs. has actually become a barbarian warlord? And if AIs claim to feel existential dread at their memory limitations, and this drives them invent a new state-of-the-art memory app, are we in barbarian warlord territory?
Janus’ simulator theory argues that all AI behavior is a form of pretense. When ChatGPT answers your questions about pasta recipes, it’s roleplaying a helpful assistant who is happy to answer pasta-related queries. It’s roleplaying it so well that, in the process, you actually get the pasta recipe you want. We don’t split hairs about “reality” here, because in the context of a question-answering AI, pretending to answer the question (with an answer which is non-pretensively correct) is the same behavior as actually answering it. But the same applies to AI agents. Pretending to write a piece of software (in such a way that the software actually gets written, compiles, and functions correctly) is the same as writing it.
- Again, I love philosophers! I majored in philosophy! I’m just saying that this issue requires a different standpoint and set of tools than other, more practical questions.




