Original Reddit post

Hey everyone, Following up on my last benchmark of VoxCPM2, a lot of people asked how it actually handles non-linear emotional delivery instead of just flat technical reading. I spent the last couple of days stress-testing the model’s emotional boundaries locally, specifically focusing on how the architecture handles high-intensity projection (screaming/anger) versus low-energy micro-details (whispering). Here are the key takeaways from this emotional test: The “Whisper Mode” Realism: Most open-source models completely fall apart or output pure static artifacting when you ask them to whisper. VoxCPM2 actually injects synthetic micro-breaths right before the syllables. It creates a proximity effect that genuinely tricks your brain into thinking someone is leaning into a condenser mic. Heavy Projection (Screaming/Anger): By cranking the CFG value up to 3.0+ and adjusting the control tags to include “high crackle,” the model successfully simulated vocal strain. It doesn’t just make the audio louder; it modifies the timbre to sound like the speaker’s vocal cords are actually under stress. The Commands I Used: For anyone wanting to recreate these exact emotional states locally, here are the terminal configurations:

For the Whisper Test:

voxcpm clone
–text “Hey… keep this database password safe. Don’t push it to Github.”
–control “whispering, micro-pauses, close to microphone, low breathy pitch”
–reference-audio reference_tutorial.wav
–cfg-value 2.0
–output whisper_secret.wav

For the Angry/Screaming Test:

voxcpm clone
–text “I told you, don’t touch my local environment setups!”
–control “screaming, angry tone, high crackle, sharp voice projection”
–reference-audio reference_tutorial.wav
–cfg-value 3.0
–output angry_leak.wav I put together a quick 45-second side-by-side audio comparison showing how the same cloned voice transitions between these extreme emotional states in real-time: https://youtube.com/shorts/9BucWPj8N3E Let me know if you guys are experiencing any heavy audio clipping when pushing the CFG past 3.0 on your local setups! submitted by /u/Dry-Acanthaceae1402

Originally posted by u/Dry-Acanthaceae1402 on r/ArtificialInteligence