ElevenCreativeElevenAgentsElevenAPIResourcesEnterprisePricing
ElevenCreative
ElevenAgents
ElevenAPI
Resources
Enterprise
Pricing
Sign upSign up
ElevenCreativeElevenAgentsElevenAPIResourcesEnterprisePricing
ElevenCreative
Introduction
Keyterm Prompting for context-aware transcription
Built-in entity detection with precise timestamps
Automatic multi-language transcription
Additional features for production workflows
Scribe v2, now in ElevenLabs Studio
Build with the API
ElevenAgents
ElevenAPI
Resources
Enterprise
Pricing
Sign upSign up
Today we’re introducing Scribe v2: the most accurate transcription model ever released, with support for more than 90 languages.
On this page
Introduction
Keyterm Prompting for context-aware transcription
Built-in entity detection with precise timestamps
Automatic multi-language transcription
Additional features for production workflows
Smart speaker diarization for clear, intuitive speaker labeling
Precise word-level timestamps for accurate subtitle alignment and interactive experiences
Dynamic audio tagging that detects non-speech events such as laughter or footsteps
Enterprise readiness with SOC 2, ISO 27001, PCI DSS L1, HIPAA, and GDPR compliance, EU and India data residency, and zero retention mode support
Scribe v2, now in ElevenLabs Studio
Build with the API
Scribe v2 is built for batch transcription, subtitling, and captioning at scale. It improves on the stability and accuracy of Scribe v1, with better handling of long-form audio, pauses, changes in tone, and extended silences.
While Scribe v2 Realtime is optimized for ultra low latency and agents use cases, Scribe v2 is optimized for long and complex recordings, maintaining accuracy across diverse speakers, accents, and delivery styles. The result is consistently reliable transcripts across a wide range of real-world audio conditions.
Scribe v2 achieves the lowest word error rate recorded on industry-standard benchmarks.
Keyterm Prompting for context-aware transcription
Keyterm prompting goes beyond standard Custom Vocabulary by using the transcript’s context. Select up to 100 words or phrases, and Scribe v2 will accurately decide when to transcribe those terms. This makes it well suited for technical domains, brand names, and industry-specific language.
Built-in entity detection with precise timestamps
Scribe v2 includes native entity detection for structured audio analysis.You can select up to 56 categories across Personally Identifiable Information, health data or payment details. Scribe v2 will automatically detect these instances and their exact timestamps in your transcript, making it easier to review, redact, or process sensitive information at scale.
Learn more in the API documentation: https://elevenlabs.io/docs/developers/guides/cookbooks/speech-to-text/batch/entity-detection
Automatic multi-language transcription
Scribe v2 supports smart multi-language workflows out of the box.
You can send audio that contains multiple languages in a single file. The model automatically detects each language and transcribes it correctly without manual segmentation or configuration.
Additional features for production workflows
Scribe v2 includes a set of features designed for enterprise and developer use cases:
ElevenCreativeText to SpeechSpeech to TextVoice ChangerText to Sound EffectsVoice CloningVoice IsolatorAI Music GeneratorStudioVoice DesignAI Voice GeneratorAI Image GeneratorAI Video Generator
Smart speaker diarization for clear, intuitive speaker labeling
Text to Speech
Speech to Text
Voice Changer
Text to Sound Effects
Voice Cloning
Voice Isolator
AI Music Generator
Studio
Voice Design
AI Voice Generator
AI Image Generator
AI Video Generator
ElevenAgentsVoice AgentsConversational AIIntegrationsTelecommunicationsFinancial ServicesHealthcareTechnologyRetail & E-commerceCustomer SupportChatbots
Precise word-level timestamps for accurate subtitle alignment and interactive experiences
Voice Agents
Conversational AI
Integrations
Telecommunications
Financial Services
Healthcare
Technology
Retail & E-commerce
Customer Support
Chatbots
ElevenAPIAPI ReferenceAgents APIDubbing APIText to Speech APISpeech to Text APISound Effects APIMusic APIAPI Key
Dynamic audio tagging that detects non-speech events such as laughter or footsteps
API Reference
Agents API
Dubbing API
Text to Speech API
Speech to Text API
Sound Effects API
Music API
API Key
ResourcesBlogIconic MarketplaceImpact ProgramStartup GrantsHelp CenterWebinarsDocsEnterpriseTrust CenterIndia
Enterprise readiness with SOC 2, ISO 27001, PCI DSS L1, HIPAA, and GDPR compliance, EU and India data residency, and zero retention mode support
Blog
Iconic Marketplace
Impact Program
Startup Grants
Help Center
Webinars
Docs
Trust Center
India
SocialsXX - DevelopersLinkedInGitHubYouTubeYouTube - DevelopersDiscordTikTokInstagramFacebookReddit
Scribe v2, now in ElevenLabs Studio
X
X - Developers
GitHub
YouTube
YouTube - Developers
Discord
TikTok
CompanyAboutCareersSafetyBrand & Press KitEU Digital Services Act (DSA)ElevenLabs SummitTermsPrivacyModern Slavery PolicyCCPA NoticeEU-US DPF PolicyAI TransparencyCookie Settings
Scribe v2 is now used in ElevenLabs Studio for more accurate subtitles, captions and transcriptions, supporting teams that manage large libraries of audio and video across marketing, media, research, training, and compliance use cases.
About
Careers
Safety
Brand & Press Kit
EU Digital Services Act (DSA)
ElevenLabs Summit
Terms
Privacy
Modern Slavery Policy
CCPA Notice
EU-US DPF Policy
AI Transparency
Cookie Settings