Original Reddit post

Hey folks, I’ve been working on a small edge AI project for in-car SOS detection and wanted to get some advice from people who’ve worked with audio ML. The idea is pretty simple: A mic continuously listens inside the car, audio gets chunked into small segments, embeddings are generated using YAMNet, and then I run local vector similarity search against distress sounds like screams, sirens, crashes, etc. For longer sounds, things are actually working surprisingly well. Screams, horns, sirens, and similar sounds are getting detected pretty reliably. The issue is gunshots. Since gunshots are extremely short (~0.2 sec), they kind of disappear inside a 1-second audio chunk, especially with background car noise like engine vibration or AC running. The important acoustic features just get diluted. Things I’ve already tried: Added gunshot samples from UrbanSound8K Reduced sequential hit requirements for impulsive sounds Added dynamic thresholds + RMS/amplitude gating Tuned similarity thresholds separately for different classes These changes improved things a bit, but detection is still inconsistent compared to longer distress sounds. Wanted to ask: Are there better gunshot datasets people recommend? Any preprocessing tricks specifically for transient/impulsive sounds? Is YAMNet just not ideal for this type of problem? Any lightweight edge models that work better for short impulse detection? Would genuinely appreciate any pointers, papers, repos, or ideas from people who’ve dealt with similar audio problems. submitted by /u/niga_chan

Originally posted by u/niga_chan on r/ArtificialInteligence