Original Reddit post

Disclosure: I’m the author of Skill Seekers , an open-source (MIT) CLI tool that converts documentation sources into SKILL.md files for Claude Code. It’s free, published on PyPI. v3.2.0 just shipped with a video extraction pipeline — this post walks through how it works technically. The problem You watch a coding tutorial, then need Claude Code to help you implement what you learned. But Claude doesn’t have the tutorial context — the code shown on screen, the order things were built, the gotchas the instructor mentioned. You end up copy-pasting snippets manually. What the video pipeline does bash skill-seekers video --url https://youtube.com/watch?v… --enhance-level 2 The pipeline extracts a structured SKILL.md from a video through 5 stages: Transcript extraction — 3-tier fallback: YouTube Transcript API → yt-dlp subtitles → faster-whisper local transcription Keyframe detection — Scene change detection pulls key frames, then classifies each as code editor, terminal, slides, webcam, or other Per-panel OCR — IDE screenshots get split into sub-panels (code area, terminal, file tree). Each panel is OCR’d independently using an EasyOCR + pytesseract ensemble with per-line confidence merging Code timeline tracking — Tracks what lines were added, changed, or removed across frames Two-pass AI enhancement — The interesting part (details below) Two-pass enhancement workflow Pass 1 — Reference cleaning: The raw OCR output is noisy. The pipeline sends each reference file (OCR text + transcript context) to Claude, asking it to reconstruct the Code Timeline. Claude uses the narrator’s words to figure out what the code should say when OCR garbled it ( l vs 1 , O vs 0 , rn vs m ). It also strips UI elements that leaked in (Inspector panels, tab bar text, line numbers). Pass 2 — SKILL.md generation: Takes the cleaned references and generates the final structured skill with setup steps, code examples, and concepts. You can define custom enhancement workflows in YAML: yaml stages: - name: ocr_code_cleanup prompt: “Clean OCR artifacts from code blocks…” - name: tutorial_synthesis prompt: “Synthesize a teaching narrative…” Five bundled presets: default , minimal , security-focus , architecture-comprehensive , api-documentation . Or write your own. Technical challenges worth sharing OCR on code editors is hard. IDE decorations (line numbers, collapse markers, tab bars) leak into text. Built _clean_ocr_line() and _fix_intra_line_duplication() to handle cases where both OCR engines return overlapping results like gpublic class Card Jpublic class Card Frame classification saves everything. Webcam frames produce pure garbage when OCR’d. Skipping WEBCAM and OTHER frame types cut junk output by ~40% The two-pass approach was a significant quality jump over single-pass. Giving Claude the transcript alongside the noisy OCR means it has context to reconstruct what single-pass enhancement would just guess at GPU setup is painful. PyTorch installs the wrong CUDA/ROCm variant if you just pip install . Built –setup that runs nvidia-smi / rocminfo to detect the GPU and installs from the correct index URL Beyond video The tool also processes: - Documentation websites (presets for React, Vue, Django, FastAPI, Godot, Kubernetes, and more) - GitHub repos (AST analysis across 9 languages, design pattern detection) - PDFs and Word docs - Outputs to Claude, Gemini, OpenAI, or RAG formats (LangChain, Pinecone, ChromaDB, etc.) Try it

Transcript-only (no GPU needed)
skill-seekers video --url <youtube-url>
Full visual extraction (needs GPU setup first)
skill-seekers video --setup skill-seekers video --url <youtube-url> --visual --enhance-level 2 ```
2,540 tests passing. Happy to answer questions about the OCR pipeline, enhancement workflows, or the panel detection approach.
submitted by
/u/Critical-Pea-8782

Originally posted by u/Critical-Pea-8782 on r/ClaudeCode