eifachposte

eifachposte

Hi all, I wanted to share a security-focused project I’ve been working on: llm-inference-tampering. It’s a proof-of-concept showing that, in a default llama.cpp setup ( llama-server using mmap-backed GGUF), model behavior can be persistently altered at runtime by writing to the model file on disk, without ptrace/process injection and without restarting the server. What the PoC demonstrates: It targets output.weight in a quantized GGUF model. By adjusting quantization scale values for selected token rows, those tokens become disproportionately likely in generation. Changes are visible immediately in inference responses. A restore mode reverts the model back using saved original values. Environment: Docker-based (Ubuntu 24.04) TinyLlama GGUF model llama-server

a Python script for controlled modification/restore I also included mitigation guidance: mount model volumes read-only whenever possible, isolate serving permissions/users, consider –no-mmap in sensitive environments, verify model integrity (hash checks) periodically. Repo: https://github.com/piotrmaciejbednarski/llm-inference-tampering submitted by /u/Acanthisitta-Sea

Originally posted by u/Acanthisitta-Sea on r/ArtificialInteligence

I built a PoC showing live LLM output tampering by modifying GGUF weights during inference

I built a PoC showing live LLM output tampering by modifying GGUF weights during inference