Transcription Problems
Having issues with transcription quality or processing? This guide helps you diagnose and fix common transcription problems.
Transcription Failed
Error Message
Transcription job failed: [error]
Cause
Transcription can fail for several reasons:
- Corrupted or invalid audio/video file
- Unsupported codec or container format
- File too large for processing (over 2GB)
- Modal.com transcription service error
- Network timeout during processing
- R2 storage access issues
Solution
- Verify file integrity:
- Try playing the file in VLC or another media player
- If it won't play, the file is corrupted - try re-recording or re-exporting
- Check file format:
- Use common formats: MP4 (H.264), MP3, WAV
- Avoid exotic codecs or uncommon container formats
- Reduce file size:
- If file is over 2GB, compress it using HandBrake or similar tools
- For very long recordings, consider splitting into chunks
- Retry the upload:
- Delete the failed content and upload again
- Failed attempts don't count toward your daily limit
Code Reference
Transcription processing happens in scribe/run_job.py using Whisper AI via Modal.com
Poor Transcription Accuracy
Cause
Low accuracy can result from:
- Poor audio quality (background noise, low volume, echo)
- Heavy accents or dialects
- Technical jargon or uncommon terminology
- Multiple overlapping speakers
- Fast speech or mumbling
- Non-standard language usage (slang, abbreviations)
Solution
1. Use Initial Prompt
Provide context to improve accuracy when uploading:
- Technical terms: "This is a software engineering discussion about React, TypeScript, and API endpoints"
- Names: "Participants include Dr. Sarah Johnson, Michael Chen, and Emma Williams"
- Industry jargon: "Medical consultation discussing diabetes, insulin resistance, and HbA1c levels"
2. Improve Audio Quality
- Use external microphones instead of built-in laptop mics
- Record in quiet environments
- Position microphone 6-12 inches from speaker
- Avoid background noise (fans, air conditioning, traffic)
- Use audio filters to reduce noise if needed
3. Speaker Considerations
- Speak clearly at normal pace
- Avoid talking over each other (enable speaker diarization to help)
- Pause between sentences
- Spell out uncommon names or terms when first mentioned
4. Post-Processing
- Edit the transcript directly in VideoToBe after completion
- Use the Chat feature to ask AI to fix specific errors
- Download and edit in your preferred text editor
Expected Accuracy
VideoToBe uses OpenAI's Whisper model, which typically achieves 95%+ accuracy with clear audio. Noisy or difficult audio may result in 70-85% accuracy.
Speaker Identification Not Working
Cause
Speaker diarization (identifying different speakers) can fail when:
- Voices are too similar (same gender, age, accent)
- Poor audio quality makes voices indistinguishable
- Speakers talk over each other frequently
- Multiple people in same room sharing one microphone
Solution
- Use separate microphones: Best results come from individual mic per speaker
- Enable speaker diarization: Make sure to check "Speaker Diarization" when uploading
- Edit speaker names: After transcription, rename "Speaker 1", "Speaker 2", etc. to actual names
- Accept limitations: Very similar voices may be grouped together - you can manually split sections after transcription
Processing Takes Too Long
Normal Processing Time
Expected transcription times:
- Short files (0-10 min): 1-3 minutes
- Medium files (10-30 min): 3-8 minutes
- Long files (30-60 min): 8-15 minutes
- Very long files (60+ min): 15-30 minutes
If Processing Exceeds Expected Time
- Check status: Refresh the page to see current status
- Wait for email: You'll receive email notification when ready
- Check after 30 minutes: If no email after 30 minutes, the job may have failed
- Retry: Delete the content and upload again
If transcription consistently fails or takes over 30 minutes, there may be an issue with the file. Try a different file to rule out system-wide problems.
Missing or Incomplete Transcript
Cause
- File has no audio track (video only)
- Audio is completely silent or extremely quiet
- Processing interrupted before completion
Solution
- Verify audio exists:
- Play the file and ensure audio is audible
- Check volume levels - should be clearly audible
- Check file format:
- Some video files have separate audio tracks that may not be detected
- Try converting to MP4 with AAC audio
- Re-upload: Delete and upload again to retry processing
Translation Issues
Translation to English Not Working
Cause
- Audio is already in English (no translation needed)
- Language not supported by Whisper
- Mixed languages in single file
Solution
- Check language support: Whisper supports 90+ languages including Spanish, French, German, Chinese, Japanese, and more
- Single language per file: For mixed-language content, Whisper will transcribe the dominant language
- Verify translation enabled: Make sure you checked "Translate to English" when uploading
Transcript Has Wrong Language
Cause
Whisper auto-detects language. It may misidentify if:
- Audio quality is very poor
- File contains multiple languages
- Speaker has heavy accent
- Very short audio clip (under 30 seconds)
Solution
- Use the "Translate to English" option to force translation
- Ensure audio is at least 30 seconds long for accurate detection
- Use initial prompt to specify language context
Best Practices for Accurate Transcription
Before Recording
- Test your microphone and audio levels
- Choose a quiet location
- Close windows and turn off fans/AC if possible
- Use headphones for video calls to reduce echo
During Recording
- Speak clearly at moderate pace
- Avoid interrupting or talking over others
- State names clearly when introducing people
- Spell out acronyms or technical terms when first used
When Uploading
- Use initial prompt for technical content
- Enable speaker diarization for multi-person recordings
- Select "Translate to English" if audio is in another language
- Use high-quality audio formats (WAV, FLAC) when possible