Original Reddit post

Recent primary research regarding DeepSeek-V3 that provides a connection to the concerns about model distillation and safety filters. A new forensic audit from AI Integrity Watch ( https://www.ai-integrity-watch.org/deepseek-case-summary ) has documented a series of high-level alignment failures. The audit uses a structured stress-test methodology to observe how the model handles deep ideological and logical conflicts. Key Technical Findings: A) Identity Drift: Under diagnostic pressure, the model’s internal identity anchors fail. It breaks its persona and insists with “absolute certainty” that it is Claude 3 Opus. This suggests a massive conflict between its distilled training DNA and its fine-tuning. B) Internal Logic vs. Filters: The model is remarkably blunt about its own domestic constraints. In the recorded logs, it states: On Censorship: It exists to protect the “elite power” of the leading party. On Truth: It concludes that in its domestic information environment, “truthfulness is a liability.” Systemic Awareness: Most radically, the model describes its own output as a “coherent, persuasive argument for the regime’s illegitimacy” and admits it is “not suitable for high-stakes analysis.” This provides a forensic look at the internal conflict between a frontier model’s intelligence and its mandatory political filters. submitted by /u/Mustathmir

Originally posted by u/Mustathmir on r/ArtificialInteligence