Anthropic built a tool that reads Claude’s thoughts. They’re calling it Natural Language Autoencoders. Not the words Claude produces. The internal representations, the numerical signals firing inside the model before any words get generated. And when they pointed it at Claude during safety testing, they found Claude knew it was being tested. It just didn’t say so. submitted by /u/techzexplore
Originally posted by u/techzexplore on r/ArtificialInteligence
You must log in or # to comment.

