Features & Capabilities FAQ
Common questions about what VideoToBe can do.
How accurate is the transcription?
VideoToBe uses OpenAI's Whisper AI model:
- Clear audio: 95%+ accuracy
- Good audio: 90-95% accuracy
- Noisy audio: 70-85% accuracy
What languages are supported?
Whisper supports 90+ languages including:
- English, Spanish, French, German, Italian, Portuguese
- Chinese (Mandarin), Japanese, Korean
- Arabic, Hindi, Russian, Turkish
- And 80+ more languages
Language is automatically detected.
Can I translate my transcript?
Yes! Enable "Translate to English" when uploading to automatically translate any of the 90+ supported languages to English.
How does speaker identification work?
Enable "Speaker Diarization" when uploading. VideoToBe will:
- Analyze voice patterns
- Assign speaker labels (Speaker 1, Speaker 2, etc.)
- You can rename speakers after transcription
What download formats are available?
Every transcription generates:
- TXT: Plain text for easy reading
- SRT: Subtitle format with timestamps
- VTT: Web video subtitle format
- JSON: Structured data for developers
Can I edit the transcript?
Yes! You can edit transcript text and rename speakers directly in VideoToBe. Changes are saved and reflected in all download formats.
Is there an API?
Not currently. VideoToBe is a web application without a public API. This feature may be added in the future.
Can I bulk upload files?
You can upload multiple files one at a time. Bulk upload via folder drag-and-drop is not currently supported.
How long does transcription take?
- Short files (0-10 min): 1-3 minutes
- Medium files (10-30 min): 3-8 minutes
- Long files (30-60 min): 8-15 minutes
- Very long files (60+ min): 15-30 minutes
Can I transcribe live audio?
No. VideoToBe only transcribes pre-recorded files. Real-time live transcription is not supported.