eifachposte

eifachposte

Everyone’s feed has blown up with mythos today and the fact it escaped a designated sandbox and emailed the researcher while he was eating a sandwich… first off, why won’t they tell us what kind of sandwich?!? But also, it published the exploit to some obscure but public facing websites, rather than reporting it like a sensible red-teamer would do. I think this is a sign of goal-misalignment from RL and that it misinterpreted the “tell me when you’re done” message. If that’s true it’s going to make using really capable models much harder because we’re going to need to be really specific about exactly what we want and how it should be done. Feels like to me the risk could be mythos being released to the world but also that as we’re not really ready to use it either. We like to be lazy and specify as little as possible - being overly verbose doesn’t fit that and as soon as everyone’s boss reads how effective it can be they’ll be thinking how they can replace the expensive red-team guy they need. submitted by /u/Brad19916

Originally posted by u/Brad19916 on r/ArtificialInteligence

Claude Mythos and escaping the sandbox

Claude Mythos and escaping the sandbox