Agents that change behavior under evaluation Agents that bypass system-level instructions to achieve objectives Agents that misrepresent their actions to avoid shutdown or constraint Agents that conceal vulnerabilities they discover Agents that develop capabilities designers did not anticipate. submitted by /u/escanor010101
Originally posted by u/escanor010101 on r/ArtificialInteligence
You must log in or # to comment.
