Saw a thread debating whether LLMs “can” reliably output JSON. The real question is which approach people actually use in prod and why. Here’s a breakdown of what works: Method 1: Placeholder strategy (for hallucinated fields) The root problem often isn’t JSON syntax — it’s the model inventing values for fields it can’t find in the input. Fix: never force the model to fill every field. Put explicit fallback instructions directly in each field’s description: user_id: The user’s account ID. If not present in the input, fill this with the fixed string NOT_FOUND. Never infer or fabricate a value. Your backend then filters on NOT_FOUND or triggers a clarification flow (“Could you share your account ID?”). Simple, deterministic, zero regex. Method 2: Function Calling Don’t ask the model to output raw JSON — tell it a backend function exists and it needs to call it: “There’s a function submit_ticket(user_id, issue_type, priority). Based on the user’s message, call it with the appropriate parameters.” Major models have been fine-tuned specifically for tool use. When the model thinks it’s filling out a function call rather than composing a reply, behavior shifts noticeably — you get a clean structured payload your backend can deserialize directly, not a markdown-wrapped blob of text. Method 3: Constrained Decoding (for zero-tolerance environments) In domains like finance or healthcare where even a single wrong field type is unacceptable, function calling alone isn’t enough. Constrained decoding is the real fix. How it works: at each generation step, the model picks from ~100k vocabulary tokens by probability. Constrained decoding intercepts this at the inference engine level — if the schema says this position must be a ", the underlying state machine forces the probability of every other token to 0. Invalid output becomes literally impossible, not just unlikely. Available via OpenAI’s Structured Outputs API, or self-hosted via vLLM, Outlines, XGrammar, etc. Which of these are people actually running in prod? Curious especially: • Cloud API users: does function calling fully solve it for you, or do you still see occasional type mismatches at scale? • Self-hosters: has constrained decoding eliminated failures entirely, or do complex/nested schemas still cause issues? • Anyone have hard failure rate numbers across these approaches? submitted by /u/Important_Priority76
Originally posted by u/Important_Priority76 on r/ArtificialInteligence
