eifachposte

eifachposte

Been a week since the original post. A lot changed as some comments gave me super suggestions! New benchmarks on real scale measured on 50K synthetic tokens, single core, release build: Index 50K tokens: 1.84ms Exact phrase (1 match): 111ns Exact phrase (~100 matches): 188µs Phrase not found (early exit): 80ns Fuzzy 1-char typo: 4.47µs Fuzzy 2-char typo: 66.8µs Unified NL search: 120µs Unified + typo tolerance: 35.8µs Hybrid BM25 + Vibe: 6.9µs Real codebase (vibe-index itself, 15.8K tokens): impl Default for (exact): 7.5µs “phrase search function” (NL): 714µs “pharse searsh” (fuzzy, 2 typos): 490µs File-aware search with metadata (file_path, line_number, line_content): Phrase search (10 files): 21.5µs Phrase search (100 files): 239µs Phrase search (500 files): 1.12ms Add file (10 files): 677µs Add file (500 files): 34.5ms Persist + reload (100 files): 128µs What’s new since the last post: File-aware search — results now include file_path , line_number , line_content . So instead of “POS 42 in chunk 3” you get src/auth.rs:42: fn authenticate(…) . This is what actually makes it usable. MCP server — exposed the whole thing as MCP tools via Python. Works with LM Studio, Ollama, Claude Desktop, OpenCode. You can index files, search phrases, fuzzy search, get index stats, and retrieve full file content by path. Query parser — handles camelCase, snake_case, :: paths, generics, imports. parse_query(“how does the auth middleware chain work?”) → [[“auth”, “middleware”, “chain”], [“auth”], [“middleware”], [“chain”]] . Stop words get stripped automatically. Binary search for file lookups — went from O(n) linear scan to O(log n). At 100+ files that’s ~2.5x faster. At 500 files the difference is measurable but still sub-ms. Hot/cold architecture — hot layer for real-time indexing, cold layer that auto-flushes to disk with gzip compression. Same bitmap indexing logic in both. Persistence survives restarts. Typo tolerance — “proces” still finds “process” in ~4µs. Uses bigram prefiltering (97% fewer Levenshtein computations) + length filter before actual distance calculation. Hybrid pattern — BM25 finds candidate documents, vibe-index pins exact lines within them. This is where it shines. Embeddings/BM25 for recall, vibe-index for precision. Snippet highlighting — matched tokens are wrapped in bold markers. fn authenticate(user: &str) instead of the full line. Makes it trivial to spot where the match happened. File size weighting — larger files get a logarithmic confidence boost. A match in a 1000-token file ranks higher than the same match in a 10-token file. Capped at 1.0. Incremental indexing — update_file() updates changed files without re-indexing everything. Uses FNV-1a content hashing to detect changes. Only removes old token positions and re-indexes the changed file. Subsequent file ranges get shifted automatically. Unified NL search — search(“where is the authenticate function”) runs phrase search on parsed query + fuzzy search on significant words, merges and deduplicates by position, sorts by confidence. One API call for everything. Example project — cargo run --example real_codebase_search – <dir> <query> indexes all .rs files in a directory and searches them. Good starting point for integrating into your own tooling. Architecture stays the same: Token stream → TokenLexicon (u32 IDs) → Roaring Bitmap per token → Anchor-and-offset phrase scan No embeddings. No vectors. No GPU. Just bitmaps and math. Comparison to what everyone else uses: The thing is, vibe-index doesn’t replace BM25 or embeddings. It complements them. The real pattern is: Embeddings/BM25 → find relevant chunks Vibe Index → find exact lines inside them Limitations: No semantic search. “login” ≠ “authenticate”. That’s the tradeoff. BM25 IDF is computed on-the-fly. Fine for small doc sets. Hot layer size is immutable — max_hot_tokens fixed at creation. No SIMD. Tested AVX2/AVX-512 on Roaring iteration — 64-115% slower. Run-compression doesn’t benefit from fixed-width SIMD ops. 100% Rust, no external ML dependencies, 74 passing tests. Any thoughts on this approach? Any suggestions on improving it are very very welcomed!!! https://github.com/mladenpop-oss/vibe-index Been 2 weeks since the original post. A lot changed as some comments gave me super suggestions! New benchmarks on real scale measured on 50K synthetic tokens, single core, release build: Index 50K tokens: 1.84ms Exact phrase (1 match): 111ns Exact phrase (~100 matches): 188µs Phrase not found (early exit): 80ns Fuzzy 1-char typo: 4.47µs Fuzzy 2-char typo: 66.8µs Unified NL search: 120µs Unified + typo tolerance: 35.8µs Hybrid BM25 + Vibe: 6.9µs Real codebase (vibe-index itself, 15.8K tokens): impl Default for (exact): 7.5µs “phrase search function” (NL): 714µs “pharse searsh” (fuzzy, 2 typos): 490µs File-aware search with metadata (file_path, line_number, line_content): Phrase search (10 files): 21.5µs Phrase search (100 files): 239µs Phrase search (500 files): 1.12ms Add file (10 files): 677µs Add file (500 files): 34.5ms Persist + reload (100 files): 128µs What’s new since the last post: File-aware search — results now include file_path, line_number, line_content. So instead of “POS 42 in chunk 3” you get src/auth.rs:42: fn authenticate(…). This is what actually makes it usable. MCP server — exposed the whole thing as MCP tools via Python. Works with LM Studio, Ollama, Claude Desktop, OpenCode. You can index files, search phrases, fuzzy search, get index stats, and retrieve full file content by path. Query parser — handles camelCase, snake_case, :: paths, generics, imports. parse_query(“how does the auth middleware chain work?”) → [[“auth”, “middleware”, “chain”], [“auth”], [“middleware”], [“chain”]]. Stop words get stripped automatically. Binary search for file lookups — went from O(n) linear scan to O(log n). At 100+ files that’s ~2.5x faster. At 500 files the difference is measurable but still sub-ms. Hot/cold architecture — hot layer for real-time indexing, cold layer that auto-flushes to disk with gzip compression. Same bitmap indexing logic in both. Persistence survives restarts. Typo tolerance — “proces” still finds “process” in ~4µs. Uses bigram prefiltering (97% fewer Levenshtein computations) + length filter before actual distance calculation. Hybrid pattern — BM25 finds candidate documents, vibe-index pins exact lines within them. This is where it shines. Embeddings/BM25 for recall, vibe-index for precision. Snippet highlighting — matched tokens are wrapped in bold markers. fn authenticate(user: &str) instead of the full line. Makes it trivial to spot where the match happened. File size weighting — larger files get a logarithmic confidence boost. A match in a 1000-token file ranks higher than the same match in a 10-token file. Capped at 1.0. Incremental indexing — update_file() updates changed files without re-indexing everything. Uses FNV-1a content hashing to detect changes. Only removes old token positions and re-indexes the changed file. Subsequent file ranges get shifted automatically. Unified NL search — search(“where is the authenticate function”) runs phrase search on parsed query + fuzzy search on significant words, merges and deduplicates by position, sorts by confidence. One API call for everything. Example project — cargo run --example real_codebase_search – <dir> <query> indexes all .rs files in a directory and searches them. Good starting point for integrating into your own tooling. Architecture stays the same: Token stream → TokenLexicon (u32 IDs) → Roaring Bitmap per token → Anchor-and-offset phrase scan No embeddings. No vectors. No GPU. Just bitmaps and math. Comparison to what everyone else uses: Approach Precision Latency Memory (50K tokens) Vibe Index Exact position 70ns–120µs ~0.5 MB BM25 Document-level 50–500 µs ~2 MB FAISS (embeddings) Semantic ~0.85 5–20 ms ~20 MB Tantivy Document-level 50–200 µs ~3 MB The thing is, vibe-index doesn’t replace BM25 or embeddings. It complements them. The real pattern is: Embeddings/BM25 → find relevant chunks Vibe Index → find exact lines inside them Limitations: No semantic search. “login” ≠ “authenticate”. That’s the tradeoff. BM25 IDF is computed on-the-fly. Fine for small doc sets. Hot layer size is immutable — max_hot_tokens fixed at creation. No SIMD. Tested AVX2/AVX-512 on Roaring iteration — 64-115% slower. Run-compression doesn’t benefit from fixed-width SIMD ops. 100% Rust, no external ML dependencies, 74 passing tests. Any thoughts on this approach? Any suggestions on improving it are very very welcomed!!! https://github.com/mladenpop-oss/vibe-index submitted by /u/Lost-Health-8675

Originally posted by u/Lost-Health-8675 on r/ClaudeCode

Update on the sub-millisecond exact phrase search for RAG

Update on the sub-millisecond exact phrase search for RAG