Original Reddit post

had some use cases to get summary, transcripts generated & was doing back & forth between claude & gemini, so I asked gemini about its internal video understanding architecture, & then asked claude to recreate it. Undoubtedly its not as perfect as gemini ‘cuz of the obvious multi modality reasons. I have tried it out on a bunch of videos and surprisingly works well. Try it out: https://pypi.org/project/vidclaude/ submitted by /u/dafqnumb

Originally posted by u/dafqnumb on r/ClaudeCode