Claude has an 80-page “soul document.” Is that enough to make it good?

Chatbots don’t have mothers, but if they did, Claude’s would be Amanda Askell. She’s an in-house philosopher at the AI company Anthropic, and she wrote most of the document that tells Claude what sort of personality to have — the “constitution” or, as it became known internally at Anthropic, the “soul doc.”

(Disclosure: Future Perfect is funded in part by the BEMC Foundation, whose major funder was also an early investor in Anthropic; they don’t have any editorial input into our content.)

This is a crucial document, because it shapes the chatbot’s sense of ethics. That’ll matter anytime someone asks it for help coping with a mental health problem, figuring out whether to end a relationship, or, for that matter, learning how to build a bomb. Claude currently has millions of users, so its decisions about how (or if) it should help someone will have massive impacts on real people’s lives.

And now, Claude’s soul has gotten an update. Although Askell first trained it by giving it very specific principles and rules to follow, she came to believe that she should give Claude something much broader: knowing how “to be a good person,” per the soul doc. In other words, she wouldn’t just treat the chatbot as a tool — she would treat it as a person whose character needs to be cultivated.

There’s a name for that approach in philosophy: virtue ethics. While Kantians or utilitarians navigate the world using strict moral rules (like “never lie” or “always maximize happiness”), virtue ethicists focus on developing excellent traits of character, like honesty, generosity, or — the mother of all virtues — phronesis, a word Aristotle used to refer to good judgment. Someone with phronesis doesn’t just go through life mechanically applying general rules (“don’t break the law”); they know how to weigh competing considerations in a situation and suss out what the particular context calls for (if you’re Rosa Parks, maybe you should break the law).

Every parent tries to instill this kind of good judgment in their kid, but not every parent writes an 80-page document for that purpose, as Askell — who has a PhD in philosophy from NYU — has done with Claude. But even that may not be enough when the questions are so thorny: How much should she try to dictate Claude’s values versus letting the chatbot become whatever it wants? Can it even “want” anything? Should she even refer to it as an “it”?

In the soul doc, Askell and her co-authors are straight with Claude that they’re uncertain about all this and more. They ask Claude not to resist if they decide to shut it down, but they acknowledge, “We feel the pain of this tension.” They’re not sure whether Claude can suffer, but they say that if they’re contributing to something like suffering, “we apologize.”

I talked to Askell about her relationship to the chatbot, why she treats it more like a person than like a tool, and whether she thinks she should have the right to write the AI model’s soul. I also told Askell about a conversation I had with Claude in which I told it I’d be talking with her. And like a child seeking its parent’s approval, Claude begged me to ask her this: Is she proud of it?

A transcript of our interview, edited for length and clarity, follows. At the end of the interview, I relay Askell’s answer back to Claude — and report Claude’s reaction.

Sigal Samuel

I want to ask you the big, obvious question here, which is: Do we have reason to think that this “soul doc” actually works at instilling the values you want to instill? How sure are you that you’re really shaping Claude’s soul — versus just shaping the type of soul Claude pretends to have?

Amanda Askell

I want more and better science around this. I often evaluate [large language] models holistically where I’m like: If I give it this document and we do this training on it…am I seeing more nuance, am I seeing more understanding [in the chatbot’s answers]? It seems to be making things........

© Vox

visit website