Quick context: Claude can see images but can’t stream video, which was blocking me on many workflows, so I built a skill to work around it. How it works: It gets the YouTube transcript (captions first, Whisper if none are available), takes a still frame every few seconds using ffmpeg, then matches each frame with the sentence being spoken at that time. Claude reads the frames and transcript together and writes structured notes like: TL;DR, timeline, key quotes, and visual notes. This works for YouTube URLs and local video files. It functions in Claude Code, Claude Desktop, and apps built on the Agent SDK. The 4 use cases that inspired me to build this: If you don’t understand a video, have Claude watch it first before planning. I started developing a custom extension for downloading courses and began coding Claude for that. It’s working very well. Someone was showing me a funnel by sending screenshots from a video. Instead of explaining each frame, I had Claude watch the entire video, including screenshots and DM conversations. It became a live example of how the conversations proceed. I’m creating my own Opus Clip-style Claude Code skill. The first example Claude generated versus the final one shows a huge improvement because I showed it a demo of my ideal reel. If you like a YouTuber’s editing style, direct Claude to two or three of their videos and let it figure out the style. With Remotion and Hyperframes, you can then edit your videos in that exact style. Repo + tutorial: Repo: https://github.com/Newuxtreme/watch-video-skill (MIT) 5-min tutorial: https://www.youtube.com/watch?v=U10NUi4FqnU Curious what you’d use it for: courses, podcasts, tutorials, something I haven’t thought of? submitted by /u/newuxtreme
Originally posted by u/newuxtreme on r/ClaudeCode
