Original Reddit post

Hey everyone, When I was trying to fine-tune Llama 3 on some internal company data, I realized I couldn’t use standard cloud generators because of strict privacy/compliance rules (especially with the new DPDP regulations here in India). I needed a way to generate RAG evaluation triplets and expand tiny seed datasets into thousands of rows without the data ever leaving my machine. So, I built Synthetic Data Factory (on my site jaconir.online). How it works under the hood: It uses web-llm to load a 1.5GB Gemma-2B model directly into your browser’s IndexedDB. The heavy inference runs in a Web Worker via WebGPU, so the main UI never lags. If you have Ollama running on localhost:11434, it auto-detects it and routes the generation to your dedicated GPU instead. It has a built-in PII Scrubber that highlights names/emails locally before you even start the generation loop. It’s completely free, no login required, and open for anyone who needs to quickly forge JSONL files for fine-tuning or RAG evaluation without the cloud overhead. I’d love some feedback from the local AI community on the “Scenario Architect” templates I’ve included for RAG testing. Is there a specific edge-case template you usually test for? submitted by /u/Impressive_Honey8334

Originally posted by u/Impressive_Honey8334 on r/ArtificialInteligence