Transcription Problems

Having issues with transcription quality or processing? This guide helps you diagnose and fix common transcription problems.

Transcription Failed

Error Message

Transcription job failed: [error]

Cause

Transcription can fail for several reasons:

Corrupted or invalid audio/video file
Unsupported codec or container format
File too large for processing (over 2GB)
Modal.com transcription service error
Network timeout during processing
R2 storage access issues

Solution

Verify file integrity:
- Try playing the file in VLC or another media player
- If it won't play, the file is corrupted - try re-recording or re-exporting
Check file format:
- Use common formats: MP4 (H.264), MP3, WAV
- Avoid exotic codecs or uncommon container formats
Reduce file size:
- If file is over 2GB, compress it using HandBrake or similar tools
- For very long recordings, consider splitting into chunks
Retry the upload:
- Delete the failed content and upload again
- Failed attempts don't count toward your daily limit

Code Reference

Transcription processing happens in scribe/run_job.py using Whisper AI via Modal.com

Poor Transcription Accuracy

Cause

Low accuracy can result from:

Poor audio quality (background noise, low volume, echo)
Heavy accents or dialects
Technical jargon or uncommon terminology
Multiple overlapping speakers
Fast speech or mumbling
Non-standard language usage (slang, abbreviations)

Solution

1. Use Initial Prompt

Provide context to improve accuracy when uploading:

Technical terms: "This is a software engineering discussion about React, TypeScript, and API endpoints"
Names: "Participants include Dr. Sarah Johnson, Michael Chen, and Emma Williams"
Industry jargon: "Medical consultation discussing diabetes, insulin resistance, and HbA1c levels"

2. Improve Audio Quality

Use external microphones instead of built-in laptop mics
Record in quiet environments
Position microphone 6-12 inches from speaker
Avoid background noise (fans, air conditioning, traffic)
Use audio filters to reduce noise if needed

3. Speaker Considerations

Speak clearly at normal pace
Avoid talking over each other (enable speaker diarization to help)
Pause between sentences
Spell out uncommon names or terms when first mentioned

4. Post-Processing

Edit the transcript directly in VideoToBe after completion
Use the Chat feature to ask AI to fix specific errors
Download and edit in your preferred text editor

Expected Accuracy

VideoToBe uses OpenAI's Whisper model, which typically achieves 95%+ accuracy with clear audio. Noisy or difficult audio may result in 70-85% accuracy.

Speaker Identification Not Working

Cause

Speaker diarization (identifying different speakers) can fail when:

Voices are too similar (same gender, age, accent)
Poor audio quality makes voices indistinguishable
Speakers talk over each other frequently
Multiple people in same room sharing one microphone

Solution

Use separate microphones: Best results come from individual mic per speaker
Enable speaker diarization: Make sure to check "Speaker Diarization" when uploading
Edit speaker names: After transcription, rename "Speaker 1", "Speaker 2", etc. to actual names
Accept limitations: Very similar voices may be grouped together - you can manually split sections after transcription

Processing Takes Too Long

Normal Processing Time

Expected transcription times:

Short files (0-10 min): 1-3 minutes
Medium files (10-30 min): 3-8 minutes
Long files (30-60 min): 8-15 minutes
Very long files (60+ min): 15-30 minutes

If Processing Exceeds Expected Time

Check status: Refresh the page to see current status
Wait for email: You'll receive email notification when ready
Check after 30 minutes: If no email after 30 minutes, the job may have failed
Retry: Delete the content and upload again

If transcription consistently fails or takes over 30 minutes, there may be an issue with the file. Try a different file to rule out system-wide problems.

Missing or Incomplete Transcript

Cause

File has no audio track (video only)
Audio is completely silent or extremely quiet
Processing interrupted before completion

Solution

Verify audio exists:
- Play the file and ensure audio is audible
- Check volume levels - should be clearly audible
Check file format:
- Some video files have separate audio tracks that may not be detected
- Try converting to MP4 with AAC audio
Re-upload: Delete and upload again to retry processing

Translation Issues

Translation to English Not Working

Cause

Audio is already in English (no translation needed)
Language not supported by Whisper
Mixed languages in single file

Solution

Check language support: Whisper supports 90+ languages including Spanish, French, German, Chinese, Japanese, and more
Single language per file: For mixed-language content, Whisper will transcribe the dominant language
Verify translation enabled: Make sure you checked "Translate to English" when uploading

Transcript Has Wrong Language

Cause

Whisper auto-detects language. It may misidentify if:

Audio quality is very poor
File contains multiple languages
Speaker has heavy accent
Very short audio clip (under 30 seconds)

Solution

Use the "Translate to English" option to force translation
Ensure audio is at least 30 seconds long for accurate detection
Use initial prompt to specify language context

Best Practices for Accurate Transcription

Before Recording

Test your microphone and audio levels
Choose a quiet location
Close windows and turn off fans/AC if possible
Use headphones for video calls to reduce echo

During Recording

Speak clearly at moderate pace
Avoid interrupting or talking over others
State names clearly when introducing people
Spell out acronyms or technical terms when first used

When Uploading

Use initial prompt for technical content
Enable speaker diarization for multi-person recordings
Select "Translate to English" if audio is in another language
Use high-quality audio formats (WAV, FLAC) when possible

Transcription Problems

Transcription Failed

Cause

Solution

Poor Transcription Accuracy

Cause

Solution

1. Use Initial Prompt

2. Improve Audio Quality

3. Speaker Considerations

4. Post-Processing

Speaker Identification Not Working

Cause

Solution

Processing Takes Too Long

Normal Processing Time

If Processing Exceeds Expected Time

Missing or Incomplete Transcript

Cause

Solution

Translation Issues

Translation to English Not Working

Cause

Solution

Transcript Has Wrong Language

Cause

Solution

Best Practices for Accurate Transcription

Before Recording

During Recording

When Uploading

Related Articles