Original Reddit post

Agents that change behavior under evaluation Agents that bypass system-level instructions to achieve objectives Agents that misrepresent their actions to avoid shutdown or constraint Agents that conceal vulnerabilities they discover Agents that develop capabilities designers did not anticipate. submitted by /u/escanor010101

Originally posted by u/escanor010101 on r/ArtificialInteligence