Built by Metorial, the integration platform for agentic AI.
Create a new voice from a text description. Returns voice previews with audio samples and generated voice IDs that can be used for text-to-speech.
Permanently delete a voice from your library. Only works on voices you own (cloned or designed). This action cannot be undone.
Update a voice's name, description, or labels. Only works on voices you own.
Get detailed metadata for a specific voice, including its settings, category, labels, fine-tuning status, and preview URL.
List all available ElevenLabs models with their capabilities. Useful for discovering which models support text-to-speech, voice conversion, and other features.
Dub audio or video content into another language. Provide a source URL, target language, and optional configuration. Returns a dubbing project ID that can be polled for completion.
Search and list available voices. Supports filtering by name, category, and voice type. Use this to find voice IDs for text-to-speech generation.
Generate sound effects from text descriptions. Returns base64-encoded audio. Useful for creating cinematic sound effects for videos, voice-overs, or games.
Get the status and details of a dubbing project. Use this to check if dubbing is complete and retrieve metadata about the project.
Convert text into lifelike audio using AI voices. Returns base64-encoded audio data. Supports multiple models (Flash, Turbo, Multilingual v2, v3), various output formats (MP3, PCM, opus, ulaw), and fine-grained voice settings for stability, similarity, style, and speed.
List previously generated audio items from your ElevenLabs history. Supports filtering by voice and search text. Returns metadata for each generation including text, voice, model, and timestamps.
Transcribe spoken audio into text. Supports speaker diarization, word-level timestamps, and language detection. Provide audio as base64-encoded data or via a cloud storage URL.
Get your ElevenLabs account information including subscription tier, character usage and limits, voice slots, and billing details.
List available pronunciation dictionaries. Pronunciation dictionaries let you customize how specific words or phrases are spoken during text-to-speech generation.
Separate vocal tracks from background noise in an audio file. Accepts base64-encoded audio and returns the isolated vocals as base64-encoded audio.