Original Reddit post

I wanted to Share a Tool I Built: NoobScribe (because my nickname is meganoob1337 ^^) The Base was parakeet-diarized , link in ATTRIBUTIONS(.)md in Repository It Exposes a Whisper Compatible API for Transcribing audio , although my main Additions are the Webui and Endpoints for the Management of Recordings, Transcripts and Speakers It runs in Docker (cpu or with nvidia docker toolkit on gpu) , uses Pyannote audio for Diarization and nvidia/canary-1b-v2 for Transcription. There are two ways to add recordings: Upload an Audio file or Record your Desktop audio (via browser screenshare) and/or your Microphone. These Audios are then Transcribed using Canary-1b-v2 and diarized with pyannote audio After Transcription and Diarization is Complete there is an Option to Save the Detected Speakers (their Embeddings from pyannote) to the vector db (Chroma) and replaces the generic Speakernames (SPEAKER_00 etc) with your Inserted Speaker name. It also Checks existing Transcripts for matching embeddings for Newly added Speakers or New Embeddings for a Speaker to update them Retroactively. A Speaker can have multiple Embeddings (i.E. when you use Different Microphones the Embeddings sometimes dont always match - like this you can make your Speaker Recognition more accurate) Everything is Locally on your Machine and you only need Docker and a HF_TOKEN (when you want to use The Diarization feature , as the Pyannote model is Gated. I Built this to help myself make better Transcripts of Meetings etc, that i can Later Summarize with an LLM. The Speaker Diarization Helps a lot in that Regard over classic Transcription. I just wanted to Share this with you guys incase someone has use for it. I used Cursor to help me develop my Features although im still a Developer (9+ Years) by Trade. I DIDNT use AI to write this Text , so bear with my for my bad form , but i didn’t want the text to feel too generic, as i hope someone will actually look at this project and maybe even Expand on it or Give feedback. Also Feel free to ask Questions here. submitted by /u/meganoob1337

Originally posted by u/meganoob1337 on r/ArtificialInteligence