every RAG tutorial i’ve seen spends 80% of the time on vector databases and embeddings and then says “chunk your documents” like it’s obvious and moves on. it’s not obvious. it’s actually the thing that breaks most implementations. fixed size chunking splits wherever the token limit hits. doesn’t care about sentence boundaries, doesn’t care if two sentences only make sense together. you end up retrieving half a thought and the model fills in the rest, confidently, which is the whole problem you were trying to solve. sliding window with overlap is what most people actually use in production and it’s fine, but the real thing that helped me was just reading what was actually getting retrieved for failed queries instead of assuming the pipeline was working. almost always the chunk was on the right topic but missing the sentence that contained the actual answer. the other thing, vector search breaks on exact identifiers. someone asks about a specific model number or product code, semantic search returns “close enough” results. close enough is wrong. hybrid search with BM25 alongside vectors handles this but it never shows up in the intro tutorials so you find out the hard way. and stale index. you update a document, don’t re-index, user gets a confidently wrong answer. it’s not a technical problem it’s a pipeline problem which is probably why nobody writes about it. curious what others are doing for re-indexing, currently on a schedule and it works but feels fragile. submitted by /u/SilverConsistent9222
Originally posted by u/SilverConsistent9222 on r/ArtificialInteligence
