eifachposte

eifachposte

Hi all, I’m building a system that takes a circuit image (breadboard/schematic) and answers questions about it. I’m looking for practical, implementation-focused advice (not just paper links). Goal Input: image + question Output: generated explanation (not just labels) Example:

Q: “What is this circuit?”
A: “LED flasher using transistor… (how it works, current flow, etc.)”

What I plan to use

VLM: BLIP-2 or LLaVA (for image + question understanding)
LLM: any good text model for explanation
Python + HuggingFace + PyTorch
Simple UI (Streamlit)

My current pipeline idea Image → VLM (extract components + description) → LLM (generate explanation) → output

What I need help with Best architecture:

Direct VLM answer vs VLM → LLM chain — which works better in practice? Circuit-specific understanding:
Any datasets or tricks for diagrams/breadboards?
Is something like CircuitVQA worth using? Fine-tuning vs prompt-only:
Is LoRA/QLoRA worth it here, or can I stay zero-shot? Detection + reasoning:
Should I add a detector (YOLO/Detectron) for components before the VLM? Evaluation:
How do you evaluate answers for VQA-style systems beyond BLEU/F1? Minimal working stack:
If you had to build an MVP in 2–3 days, what exact stack would you pick?

Constraints

Prefer open models / local or free options
Focus on generative output (explanations), not just classification

If you’ve built something similar or have pointers (repos, configs, pitfalls), I’d really appreciate it. Thanks! submitted by /u/vishal55282

Originally posted by u/vishal55282 on r/ArtificialInteligence

Need practical guidance: building a VLM + LLM system for circuit analysis (VQA)

Need practical guidance: building a VLM + LLM system for circuit analysis (VQA)

My current pipeline idea Image → VLM (extract components + description) → LLM (generate explanation) → output