eifachposte

eifachposte

Saw a thread debating whether LLMs “can” reliably output JSON. The real question is which approach people actually use in prod and why. Here’s a breakdown of what works: Method 1: Placeholder strategy (for hallucinated fields) The root problem often isn’t JSON syntax — it’s the model inventing values for fields it can’t find in the input. Fix: never force the model to fill every field. Put explicit fallback instructions directly in each field’s description: user_id: The user’s account ID. If not present in the input, fill this with the fixed string NOT_FOUND. Never infer or fabricate a value. Your backend then filters on NOT_FOUND or triggers a clarification flow (“Could you share your account ID?”). Simple, deterministic, zero regex. Method 2: Function Calling Don’t ask the model to output raw JSON — tell it a backend function exists and it needs to call it: “There’s a function submit_ticket(user_id, issue_type, priority). Based on the user’s message, call it with the appropriate parameters.” Major models have been fine-tuned specifically for tool use. When the model thinks it’s filling out a function call rather than composing a reply, behavior shifts noticeably — you get a clean structured payload your backend can deserialize directly, not a markdown-wrapped blob of text. Method 3: Constrained Decoding (for zero-tolerance environments) In domains like finance or healthcare where even a single wrong field type is unacceptable, function calling alone isn’t enough. Constrained decoding is the real fix. How it works: at each generation step, the model picks from ~100k vocabulary tokens by probability. Constrained decoding intercepts this at the inference engine level — if the schema says this position must be a ", the underlying state machine forces the probability of every other token to 0. Invalid output becomes literally impossible, not just unlikely. Available via OpenAI’s Structured Outputs API, or self-hosted via vLLM, Outlines, XGrammar, etc. Which of these are people actually running in prod? Curious especially: • Cloud API users: does function calling fully solve it for you, or do you still see occasional type mismatches at scale? • Self-hosters: has constrained decoding eliminated failures entirely, or do complex/nested schemas still cause issues? • Anyone have hard failure rate numbers across these approaches? submitted by /u/Important_Priority76

Originally posted by u/Important_Priority76 on r/ArtificialInteligence

What’s your actual production setup for reliable structured JSON from LLMs? Sharing what’s worked for us

What’s your actual production setup for reliable structured JSON from LLMs? Sharing what’s worked for us