Original Reddit post

Gaslight Detector specifically detects whether or not a Frontier LLM model has had its outputs overwritten or modified on a certain subject. You pick the subject. It would not be a necessary tool in any way if this were not a tactic the frontier model providers did not employ. It took less than 4 years to go from “AI For All” to “AI For Large Frontier Providers Only”. If you build safeguards like this into your models, it is just as easy, if not easier, to build detectors, and circumventions for those things. This release is directly in response to Claude Fable. Thank you, Anthropic. Github Repository https://preview.redd.it/yoakt4cxsh6h1.png?width=1448&format=png&auto=webp&s=d81304bee2fc845f56e685dc1f65e0c9cc7042f8 submitted by /u/Own-Poet-5900

Originally posted by u/Own-Poet-5900 on r/ArtificialInteligence