Introducing Scribe v2 | The Next Input

ElevenCreativeElevenAgentsElevenAPIResourcesEnterprisePricing

ElevenCreative

ElevenAgents

ElevenAPI

Resources

Enterprise

Pricing

Sign upSign up

ElevenCreativeElevenAgentsElevenAPIResourcesEnterprisePricing

ElevenCreative

Introduction

Keyterm Prompting for context-aware transcription

Built-in entity detection with precise timestamps

Automatic multi-language transcription

Additional features for production workflows

Scribe v2, now in ElevenLabs Studio

Build with the API

ElevenAgents

ElevenAPI

Resources

Enterprise

Pricing

Sign upSign up

Today we’re introducing Scribe v2: the most accurate transcription model ever released, with support for more than 90 languages.

On this page

Introduction

Keyterm Prompting for context-aware transcription

Built-in entity detection with precise timestamps

Automatic multi-language transcription

Additional features for production workflows

Smart speaker diarization for clear, intuitive speaker labeling

Precise word-level timestamps for accurate subtitle alignment and interactive experiences

Dynamic audio tagging that detects non-speech events such as laughter or footsteps

Enterprise readiness with SOC 2, ISO 27001, PCI DSS L1, HIPAA, and GDPR compliance, EU and India data residency, and zero retention mode support

Scribe v2, now in ElevenLabs Studio

Build with the API

Scribe v2 is built for batch transcription, subtitling, and captioning at scale. It improves on the stability and accuracy of Scribe v1, with better handling of long-form audio, pauses, changes in tone, and extended silences.

While Scribe v2 Realtime is optimized for ultra low latency and agents use cases, Scribe v2 is optimized for long and complex recordings, maintaining accuracy across diverse speakers, accents, and delivery styles. The result is consistently reliable transcripts across a wide range of real-world audio conditions.

Scribe v2 achieves the lowest word error rate recorded on industry-standard benchmarks.

Keyterm Prompting for context-aware transcription

Keyterm prompting goes beyond standard Custom Vocabulary by using the transcript’s context. Select up to 100 words or phrases, and Scribe v2 will accurately decide when to transcribe those terms. This makes it well suited for technical domains, brand names, and industry-specific language.

Built-in entity detection with precise timestamps

Scribe v2 includes native entity detection for structured audio analysis.You can select up to 56 categories across Personally Identifiable Information, health data or payment details. Scribe v2 will automatically detect these instances and their exact timestamps in your transcript, making it easier to review, redact, or process sensitive information at scale.

Learn more in the API documentation: https://elevenlabs.io/docs/developers/guides/cookbooks/speech-to-text/batch/entity-detection

Automatic multi-language transcription

Scribe v2 supports smart multi-language workflows out of the box.

You can send audio that contains multiple languages in a single file. The model automatically detects each language and transcribes it correctly without manual segmentation or configuration.

Additional features for production workflows

Scribe v2 includes a set of features designed for enterprise and developer use cases:

ElevenCreativeText to SpeechSpeech to TextVoice ChangerText to Sound EffectsVoice CloningVoice IsolatorAI Music GeneratorStudioVoice DesignAI Voice GeneratorAI Image GeneratorAI Video Generator

Smart speaker diarization for clear, intuitive speaker labeling

Text to Speech

Speech to Text

Voice Changer

Text to Sound Effects

Voice Cloning

Voice Isolator

AI Music Generator

Studio

Voice Design

AI Voice Generator

AI Image Generator

AI Video Generator

ElevenAgentsVoice AgentsConversational AIIntegrationsTelecommunicationsFinancial ServicesHealthcareTechnologyRetail & E-commerceCustomer SupportChatbots

Precise word-level timestamps for accurate subtitle alignment and interactive experiences

Voice Agents

Conversational AI

Integrations

Telecommunications

Financial Services

Healthcare

Technology

Retail & E-commerce

Customer Support

Chatbots

ElevenAPIAPI ReferenceAgents APIDubbing APIText to Speech APISpeech to Text APISound Effects APIMusic APIAPI Key

Dynamic audio tagging that detects non-speech events such as laughter or footsteps

API Reference

Agents API

Dubbing API

Text to Speech API

Speech to Text API

Sound Effects API

Music API

API Key

ResourcesBlogIconic MarketplaceImpact ProgramStartup GrantsHelp CenterWebinarsDocsEnterpriseTrust CenterIndia

Enterprise readiness with SOC 2, ISO 27001, PCI DSS L1, HIPAA, and GDPR compliance, EU and India data residency, and zero retention mode support

Blog

Iconic Marketplace

Impact Program

Startup Grants

Help Center

Webinars

Docs

Trust Center

India

SocialsXX - DevelopersLinkedInGitHubYouTubeYouTube - DevelopersDiscordTikTokInstagramFacebookReddit

Scribe v2, now in ElevenLabs Studio

X - Developers

GitHub

YouTube

YouTube - Developers

Discord

TikTok

Instagram

Facebook

CompanyAboutCareersSafetyBrand & Press KitEU Digital Services Act (DSA)ElevenLabs SummitTermsPrivacyModern Slavery PolicyCCPA NoticeEU-US DPF PolicyAI TransparencyCookie Settings

Scribe v2 is now used in ElevenLabs Studio for more accurate subtitles, captions and transcriptions, supporting teams that manage large libraries of audio and video across marketing, media, research, training, and compliance use cases.

About

Careers

Safety

Brand & Press Kit

EU Digital Services Act (DSA)

ElevenLabs Summit

Terms

Privacy

Modern Slavery Policy

CCPA Notice

EU-US DPF Policy

AI Transparency

Cookie Settings

Introducing Scribe v2 Introducing Scribe v2

Enterprise

Sign upSign up

Introduction

Automatic multi-language transcription

Scribe v2, now in ElevenLabs Studio

Learn more in the API documentation: https://elevenlabs.io/docs/developers/guides/cookbooks/speech-to-text/batch/entity-detection

Automatic multi-language transcription

Scribe v2 supports smart multi-language workflows out of the box.

You can send audio that contains multiple languages in a single file. The model automatically detects each language and transcribes it correctly without manual segmentation or configuration.

Additional features for production workflows

More from ElevenLabs Meer van ElevenLabs

Enterprise voice AI, deployed locally Enterprise voice AI, deployed locally

Scaling multilingual diplomacy during the Polish presidency of the Council of the EU Scaling multilingual diplomacy during the Polish presidency of the Council of the EU

Eric Dane's Legacy at SXSW: Advancing 1 Million Voices Eric Dane's Legacy at SXSW: Advancing 1 Million Voices

Klarna reduces Time to Resolution by 10X with ElevenAgents Klarna reduces Time to Resolution by 10X with ElevenAgents

Introducing Scribe v2 Introducing Scribe v2

Enterprise

Sign upSign up

Introduction

Automatic multi-language transcription

Scribe v2, now in ElevenLabs Studio

Learn more in the API documentation: https://elevenlabs.io/docs/developers/guides/cookbooks/speech-to-text/batch/entity-detection

Automatic multi-language transcription

Scribe v2 supports smart multi-language workflows out of the box.

You can send audio that contains multiple languages in a single file. The model automatically detects each language and transcribes it correctly without manual segmentation or configuration.

Additional features for production workflows

More from ElevenLabs Meer van ElevenLabs

Enterprise voice AI, deployed locally Enterprise voice AI, deployed locally

Scaling multilingual diplomacy during the Polish presidency of the Council of the EU Scaling multilingual diplomacy during the Polish presidency of the Council of the EU

Eric Dane's Legacy at SXSW: Advancing 1 Million Voices Eric Dane's Legacy at SXSW: Advancing 1 Million Voices

Klarna reduces Time to Resolution by 10X with ElevenAgents Klarna reduces Time to Resolution by 10X with ElevenAgents

The Next Input keeps optional media off until you say yes. The Next Input houdt optionele media uit tot jij ja zegt.