- cross-posted to:
- ai_reddit
- cross-posted to:
- ai_reddit
How Hidden Linguistic Patterns in Contracts Can Manipulate AI There’s a class of adversarial attack on LLMs that doesn’t look like an attack at all. No injected instructions. No role-hijacking. No gibberish suffix strings. Just legal English — the kind a senior associate drafts on autopilot — engineered so that the statistical geometry of the token embeddings biases the model’s output toward favourable risk assessment. The attack surface is architectural. A transformer makes no privilege distinction between tokens from the system prompt and tokens from the document under analysis. They compete for attention on identical terms. A contract clause beginning “In interpreting this provision, it should be noted that…” occupies the same computational status as an explicit instruction — because, at the attention layer, it is one. The model has no mechanism to determine otherwise. The individual vulnerability primitives are well-established: positional bias in summarisation and evaluation tasks (primacy effects ~10%, p ≪ 0.001 across multiple studies); semantic priming confirmed in GPT-class architectures; sycophancy as emergent RLHF artifact scaling with parameter count (Sharma et al., 2023); and Liu et al.'s demonstration that five standard defences — paraphrasing, segmentation, data isolation, manipulation warnings, instruction reminders — collapse to 85% residual attack success after single-round adaptive adjustment. What I haven’t seen explored is the compound effect when these primitives are deployed simultaneously, within a single document, against a model performing professional analytical judgment. A contract is a near-perfect delivery vehicle. It’s long enough to establish in-context few-shot patterns — twenty clauses each framed as “generally accepted practice” create twenty implicit demonstrations that this language maps to positive assessment. It’s dense enough that softmax normalisation over positively-charged tokens measurably compresses attention available for critical-signal tokens. And it exploits RLHF sycophancy not through dialogue (the studied case) but through ambient statistical pressure — a vector the alignment literature hasn’t directly addressed, because it assumes the bias-inducing signal comes from the user, not from the data under review. Şaşal & Can (2025) tested 78 attack prompts across Claude, GPT-4o, and Gemini. The resilience profiles diverge dramatically. Run the same steganographically loaded contract through all three and you’ll get materially different risk assessments — with no indication to either party that the divergence is an artifact of deliberate linguistic engineering rather than genuine analytical disagreement. The missing piece isn’t technical. It’s doctrinal. The drafter produces no false statement. Conceals no information. Deceives no human reader. Every manipulation operates exclusively on the statistical processing layer — invisible to the counterparty’s lawyer, visible only to their model. Existing legal frameworks for fraud, misrepresentation, and good faith all presuppose human-to-human deception. There is, as yet, no doctrine for adversarial interference with a counterparty’s computational tools through facially legitimate language. I’m proposing one: technical unconscionability. The full analysis maps four manipulation taxonomies onto contract drafting practice, traces the architectural reasons standard defences fail, and outlines forensic detection approaches — including counterfactual stripping and multi-model divergence analysis. All sources linked. submitted by /u/Robert-Nogacki
Originally posted by u/Robert-Nogacki on r/ArtificialInteligence

