Model Details
ElevenLabs Scribe v1 turns spoken-audio files into accurate written text. Pass a URL to an audio recording (mp3, wav, m4a, ogg, or aac) and it returns the full transcript as a plain string, with state-of-the-art accuracy across 99 languages. It auto-detects the spoken language, can label who is speaking (diarization), and tags non-speech audio events like laughter and applause — making it a strong default for turning recordings into usable, searchable text.
## Best for - Transcribing meetings, interviews, podcasts, and voice notes into text - Captioning and subtitling source audio with reliable word boundaries - Multilingual transcription where the spoken language is unknown or mixed (99 languages, auto-detected) - Speaker-attributed transcripts of multi-person conversations using diarization - Building searchable archives or downstream NLP from spoken-audio content
## Choose another model when - You want to generate speech from text rather than transcribe it — use a text-to-speech model - You need to translate audio into a different language's text — this transcribes in the spoken language, it is not a speech translator - You need live, streaming transcription of an in-progress call — this processes a complete uploaded file and returns a finished transcript
## Tips - Leave `language_code` unset to auto-detect the spoken language; set it to an ISO-639 code (e.g. `eng`, `spa`, `fra`, `deu`, `jpn`) only when you already know the language and want to skip detection. - Keep `diarize` enabled (default) for multi-speaker recordings; the model attributes each word to a speaker. Set it to `false` for single-speaker audio to skip speaker labeling. - Keep `tag_audio_events` enabled (default) to mark non-speech sounds (laughter, applause) inline; set it to `false` for a clean speech-only transcript. - Use clear, reasonably loud source audio — heavy background noise and overlapping speech reduce accuracy.
## Advanced Configuration - `language_code` (default auto-detect): an ISO-639 language code that forces the transcription language instead of detecting it. Useful when the audio is short or the language is known in advance. - `tag_audio_events` (boolean, default `true`): when `true`, non-speech events such as laughter and applause are tagged inline in the transcript. - `diarize` (boolean, default `true`): when `true`, annotates which speaker said each word.
To run via the ModelRunner JavaScript client: ```js import { modelrunner } from "@modelrunner/client";
const result = await modelrunner.subscribe("elevenlabs/scribe-v1", { input: { audio_url: "https://storage.googleapis.com/falserverless/web-examples/elevenlabs/sample.mp3", diarize: true, tag_audio_events: true, }, }); ```

