Original Reddit post

I’m one of the founders of DoDocs.ai, so full disclosure upfront. Sharing this because the technical path to get here was non-obvious and might be useful to others building in doc intelligence. The problem Sol.Online is an accounting software platform whose clients were processing invoices manually. Each invoice took ~10 minutes — open it, extract fields, cross-reference with the system, log the result. At scale this created a hard ceiling on how many clients they could serve without growing their support and ops teams. What we built Our MatchPoint pipeline does three things in sequence:

  1. Document classification — identifies invoice type and expected field schema before extraction even starts 2. Adaptive OCR + LLM extraction — rather than a fixed template, the model infers field positions based on layout context, handling the variance you see across different clients’ invoice formats 3. Structured output with confidence scoring — each extracted field gets a confidence score; low-confidence fields are flagged for human review instead of silently failing No retraining needed when new invoice formats come in. The pipeline handles layout drift automatically. Results Processing time per invoice: 10 minutes → 10 seconds. Sol.Online increased their client-serving capacity by 30% without adding headcount. What didn’t work initially First version used pure template matching. Broke constantly when vendors changed invoice layouts even slightly. Switching to layout-aware extraction with LLM context was the fix. Happy to go deeper on the confidence scoring logic or the classification step if anyone’s curious. Repo/demo: dodocs.ai submitted by /u/whynot2night

Originally posted by u/whynot2night on r/ArtificialInteligence