Claude Knew It Was Being Tested. It Just Didn't Say So. Anthropic Built a Tool to Find Out.

firethering.com

Claude Knew It Was Being Tested. It Just Didn't Say So. Anthropic Built a Tool to Find Out.

firethering.com

eifachposteMB to AI (Reddit RSS)English · 7 hours ago

Claude Knew It Was Being Tested. It Just Didn't Say So. Anthropic Built a Tool to Find Out. - Firethering

firethering.com

Anthropic built a tool that reads Claude's thoughts. Not the words Claude produces. The internal representations like the numerical signals firing inside the model before any words get generated. And when they pointed it at Claude during safety testing, they found something that should make anyone building or using AI pay attention. Claude knew it was being tested. It just didn't say so.

Original Reddit post

Anthropic built a tool that reads Claude’s thoughts. They’re calling it Natural Language Autoencoders. Not the words Claude produces. The internal representations, the numerical signals firing inside the model before any words get generated. And when they pointed it at Claude during safety testing, they found Claude knew it was being tested. It just didn’t say so. submitted by /u/techzexplore

Originally posted by u/techzexplore on r/ArtificialInteligence

You must log in or # to comment.

Chat