Built by Metorial, the integration platform for agentic AI.
Provider Summary
transcribe audio to text
real-time speech streaming
detect speaker identity
analyze transcript sentiment
extract named entities
redact personal information
generate subtitles and captions
summarize audio content
translate transcripts
apply LLMs to speech data
Transcribe pre-recorded and live audio/video to text with support for 99+ languages, speaker diarization, and multichannel audio. Apply audio intelligence models to extract summaries, sentiment analysis, entity detection, topic detection, key phrases, and content moderation from transcripts. Redact personally identifiable information from text and audio. Generate SRT/VTT subtitles and segment transcripts into paragraphs, sentences, or auto-chapters. Stream real-time speech-to-text via WebSocket connections. Upload audio/video files for processing. Manage and delete transcripts. Access an LLM gateway to apply large language models (Claude, GPT, Gemini) to transcribed speech data for summarization, Q&A, and custom analysis. Translate transcripts across 99+ languages. Receive webhook notifications when transcriptions complete or fail.
Generate a temporary authentication token for use with AssemblyAI's real-time streaming speech-to-text WebSocket API. Use this to securely authenticate client-side streaming without exposing your main API key. Each token is single-use and valid for one streaming session.
Delete a transcript by removing its data and marking it as deleted. The transcript resource itself remains but its data is permanently removed. Any files uploaded via the upload endpoint are also immediately deleted alongside the transcript.
Retrieve the URL for a PII-redacted audio file. The original transcription must have been submitted with PII audio redaction enabled (\
Export a completed transcript as SRT or VTT subtitle format for use with video players for subtitles and closed captions. Optionally limit the number of characters per caption line.
Retrieve a completed transcript's text segmented into sentences or paragraphs. The API semantically segments the text for more reader-friendly output. Choose "sentences" or "paragraphs" segmentation depending on how granular you need the output.
Retrieve a transcript by its ID. Returns the full transcript object including text, words with timestamps, speaker labels, and any enabled audio intelligence results (summary, sentiment, entities, topics, chapters, content safety, key phrases). Use this to poll for completion after submitting a transcription, or to retrieve results of a completed transcript.
Apply a large language model to one or more transcripts using AssemblyAI's LeMUR framework. Submit a custom prompt along with transcript IDs or raw text input, and receive an LLM-generated response. Use this for summarizing transcripts, extracting insights, answering questions about audio content, generating action items, or any custom analysis task. Supports multiple LLM providers including Claude, GPT, and Gemini models.
List transcripts with pagination and optional filters. Returns transcript summaries sorted from newest to oldest. Supports filtering by status and creation date, and cursor-based pagination using before/after IDs.
Search through a completed transcript for specific keywords. You can search for individual words, numbers, or phrases of up to five words. Returns match counts and timestamps for each keyword found.
Submit an audio or video file for asynchronous transcription. Provide a publicly accessible URL to the media file. Optionally enable audio intelligence features like summarization, sentiment analysis, entity detection, topic detection, content moderation, key phrases, auto chapters, and PII redaction. Returns the transcript object with a status of "queued" — poll using the Get Transcript tool to check for completion.
This integration is licensed under the AGPL-3.0 License.
Built with ❤️ by Metorial