AI Wisdom
๐ŸŽ™๏ธ

Speech & Audio

Speech recognition, text-to-speech, voice cloning, and AI music generation.

Graduated ยท 5Incubating ยท 3Sandbox ยท 210 total
โ† All categories

Whisper Large V3

Graduated
5/5

OpenAI's open-source speech recognition model โ€” gold standard for ASR

Best open-source ASR model by far. 99 languages, timestamps, speaker diarization via community extensions. Self-host for free or use via API. Essential for any transcription pipeline.

Open Source

ElevenLabs

Graduated
5/5

Best-in-class text-to-speech with voice cloning and dubbing

Most natural-sounding TTS available. Voice cloning from 30 seconds of audio. Multilingual dubbing preserves emotion and cadence. API-first with generous free tier.

Proprietary

OpenAI TTS

Graduated
4/5

OpenAI's simple API for high-quality speech synthesis

Clean, natural-sounding voices with simple API. Six built-in voices. HD variant for highest quality. Best for developers wanting quick TTS integration without complexity.

Proprietary

Deepgram

Graduated
4/5

Real-time speech-to-text API with sub-300ms latency

Fastest production ASR API โ€” under 300ms latency for real-time use. Nova-2 model rivals Whisper quality. Best for live transcription, call centres, and real-time applications.

Proprietary

AssemblyAI

Graduated
4/5

Transcription API with built-in audio intelligence features

Excellent transcription with audio intelligence layered on top โ€” sentiment analysis, topic detection, PII redaction, summarization. Universal-2 model handles accents and noise well.

Proprietary

Suno v4

Incubating
4/5

AI music generation from text prompts with full song structure

Generate full songs with vocals, instruments, and structure from text descriptions. Quality approaching amateur production. Game-changer for content creators, ads, and game audio.

Proprietary

Udio

Incubating
4/5

AI music creation with fine-grained style and genre control

Best genre fidelity in AI music โ€” excels at specific styles from jazz to electronic. Higher audio quality than competitors. Good for musicians wanting AI-assisted composition.

Proprietary

Bark

Incubating
3/5

Open-source TTS model with emotions, music, and sound effects

Unique open-source TTS that generates speech with laughing, singing, and sound effects. Quality less consistent than ElevenLabs but fully self-hostable. Great for creative applications.

Open Source

Kokoro TTS

Sandbox
3/5

Lightweight 82M-parameter open-source text-to-speech model

Impressively natural speech from a tiny 82M model โ€” runs on CPU. Good for edge deployment and resource-constrained environments. Quality approaching much larger models.

Open Source

Coqui TTS

Sandbox
3/5

Open-source TTS toolkit for training and deploying custom voices

Most flexible open-source TTS toolkit. Train custom voices on your own data. XTTS model supports 16 languages with voice cloning. Best for teams needing fully custom voice solutions.

Open Source