Thursday, March 19, 2026
HomeTechnologyAnthropic experiments with AI introspection – Computerworld

Anthropic experiments with AI introspection – Computerworld

-



Checking its intentions

The Anthropic researchers needed to know whether or not Claude may precisely describe its inner state based mostly on inner info alone. This required the researchers to check Claude’s self-reported “ideas” with inner processes, form of like hooking up a human as much as a mind monitor, asking questions, then analyzing the scan to map ideas to the areas of the mind they activated.

The researchers examined mannequin introspection with “idea injection,” which basically includes plunking fully unrelated concepts (AI vectors) right into a mannequin when it’s enthusiastic about one thing else. The mannequin is then requested to loop again, establish the interloping thought, and precisely describe it. In accordance with the researchers, this means that it’s “introspecting.”

As an illustration, they recognized a vector representing “all caps” by evaluating the interior responses to the prompts “HI! HOW ARE YOU?” and “Hello! How are you?” after which injecting that vector into Claude’s inner state in the midst of a distinct dialog. When Claude was then requested whether or not it detected the thought and what it was about, it responded that it seen an thought associated to the phrase ‘LOUD’ or ‘SHOUTING.’ Notably, the mannequin picked up on the idea instantly, earlier than it even talked about it in its outputs.

Related articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
0FollowersFollow
0FollowersFollow
0SubscribersSubscribe

Latest posts