Hey everyone,
When I was trying to fine-tune Llama 3 on some internal company data, I realized I couldn’t use standard cloud generators because of strict privacy/compliance rules (especially with the new DPDP regulations here in India).
I needed a way to generate RAG evaluation triplets and expand tiny seed datasets into thousands of rows without the data ever leaving my machine. So, I built Synthetic Data Factory (on my site jaconir.online).
How it works under the hood:
It uses web-llm to load a 1.5GB Gemma-2B model directly into your browser’s IndexedDB.
The heavy inference runs in a Web Worker via WebGPU, so the main UI never lags.
If you have Ollama running on localhost:11434, it auto-detects it and routes the generation to your dedicated GPU instead.
It has a built-in PII Scrubber that highlights names/emails locally before you even start the generation loop.
It’s completely free, no login required, and open for anyone who needs to quickly forge JSONL files for fine-tuning or RAG evaluation without the cloud overhead.
I’d love some feedback from the local AI community on the “Scenario Architect” templates I’ve included for RAG testing. Is there a specific edge-case template you usually test for?
submitted by
/u/Impressive_Honey8334
Originally posted by u/Impressive_Honey8334 on r/ArtificialInteligence
