🎙️

Speech & Audio

Speech recognition, text-to-speech, voice cloning, and AI music generation.

Production · 5Stable · 3Experimental · 210 total

AssemblyAI

Production

4/5

Transcription API with built-in audio intelligence features

Excellent transcription with audio intelligence layered on top — sentiment analysis, topic detection, PII redaction, summarization. Universal-2 model handles accents and noise well.

Proprietary

Docs ↗

Bark

Stable

3/5

Open-source TTS model with emotions, music, and sound effects

Unique open-source TTS that generates speech with laughing, singing, and sound effects. Quality less consistent than ElevenLabs but fully self-hostable. Great for creative applications.

Open Source

Docs ↗

Coqui TTS

Experimental

3/5

Open-source TTS toolkit for training and deploying custom voices

Most flexible open-source TTS toolkit. Train custom voices on your own data. XTTS model supports 16 languages with voice cloning. Best for teams needing fully custom voice solutions.

Open Source

Docs ↗

Deepgram

Production

4/5

Real-time speech-to-text API with sub-300ms latency

Fastest production ASR API — under 300ms latency for real-time use. Nova-2 model rivals Whisper quality. Best for live transcription, call centres, and real-time applications.

Proprietary

Docs ↗

ElevenLabs

Production

5/5

Best-in-class text-to-speech with voice cloning and dubbing

Most natural-sounding TTS available. Voice cloning from 30 seconds of audio. Multilingual dubbing preserves emotion and cadence. API-first with generous free tier.

Proprietary

Docs ↗

Kokoro TTS

Experimental

3/5

Lightweight 82M-parameter open-source text-to-speech model

Impressively natural speech from a tiny 82M model — runs on CPU. Good for edge deployment and resource-constrained environments. Quality approaching much larger models.

Open Source

Docs ↗

OpenAI TTS

Production

4/5

OpenAI's simple API for high-quality speech synthesis

Clean, natural-sounding voices with simple API. Six built-in voices. HD variant for highest quality. Best for developers wanting quick TTS integration without complexity.

Proprietary

Docs ↗

Suno v4

Stable

4/5

AI music generation from text prompts with full song structure

Generate full songs with vocals, instruments, and structure from text descriptions. Quality approaching amateur production. Game-changer for content creators, ads, and game audio.

Proprietary

Docs ↗

Udio

Stable

4/5

AI music creation with fine-grained style and genre control

Best genre fidelity in AI music — excels at specific styles from jazz to electronic. Higher audio quality than competitors. Good for musicians wanting AI-assisted composition.

Proprietary

Docs ↗

Whisper Large V3

Production

5/5

OpenAI's open-source speech recognition model — gold standard for ASR

Best open-source ASR model by far. 99 languages, timestamps, speaker diarization via community extensions. Self-host for free or use via API. Essential for any transcription pipeline.

Open Source

Docs ↗

Other Categories

🧠Text Generation & Reasoning20 tools 💻Code Generation12 tools 🎨Image Generation12 tools 🎬Video Generation10 tools 🔗Embedding Models10 tools 👁️Multimodal & Vision10 tools 🤖AI Agents & Platforms10 tools 🔧Frameworks & SDKs12 tools ☁️Cloud AI Platforms10 tools 🗄️Vector Databases10 tools 🔬Data & Fine-tuning10 tools 📊Observability & Evals8 tools 🚀Model Serving8 tools 🛡️Guardrails & Safety8 tools