Uploading Media
VideoToBe makes it easy to upload audio and video files for transcription. This guide covers supported formats, file size limits, and best practices for optimal results.
Supported File Formats
Audio Formats
VideoToBe supports all common audio formats:
- MP3 - Most common, good quality-to-size ratio
- WAV - Uncompressed, best quality (large file size)
- M4A - Apple audio format, good compression
- AAC - Advanced audio codec, efficient compression
- FLAC - Lossless compression, excellent quality
- OGG - Open-source format, good compression
- WMA - Windows Media Audio
Video Formats
Upload any video file - VideoToBe automatically extracts the audio:
- MP4 - Most common video format
- MOV - Apple video format
- AVI - Windows video format
- MKV - Matroska video container
- WEBM - Web-optimized video
- FLV - Flash video (legacy)
- WMV - Windows Media Video
Video to Audio Conversion
When you upload a video file, VideoToBe automatically extracts the audio track for transcription. There's no need to convert video to audio before uploading.
File Size and Duration Limits
| Limit Type | Free Plan | Pro Plan |
|---|---|---|
| Max file size | 500MB | 2GB |
| Max duration | 30 minutes | Unlimited |
| Uploads per day | 3 | Unlimited |
Compare plans and upgrade →
How to Upload Files
Navigate to Your Workspace
Log in to app.videotobe.com and select your workspace from the dropdown.
Start Upload
You have two options:
- Click the Upload button in the top navigation
- Drag and drop files directly onto the page
Both methods work for single or multiple files.
Select Your File
Choose a file from your computer. Supported formats are automatically validated.
Unsupported File Type
If you see We only accept audio and video files, your file format is not supported. Convert it to MP3 or MP4 first.
Configure Transcription Options
Before uploading, configure optional settings:
- Language: Auto-detect or select manually (90+ languages)
- Translate to English: Convert from any language to English
- Speaker Diarization: Identify multiple speakers (Speaker 1, Speaker 2, etc.)
- Initial Prompt: Provide context for better accuracy (e.g., technical jargon, proper names)
Upload and Wait
Click Transcribe to start the upload and processing. Processing typically takes 2-5 minutes depending on file length.
You can:
- Wait on the page for real-time progress updates
- Close your browser - you'll get an email when complete
- Upload additional files while waiting (within daily limits)
Best Practices for Better Accuracy
Audio Quality Recommendations
For best transcription accuracy:
- Bitrate: 128 kbps or higher (192 kbps ideal)
- Sample Rate: 44.1 kHz or 48 kHz
- Format: Lossless formats (WAV, FLAC) for critical content
- Background Noise: Minimize as much as possible
- Microphone Distance: Keep speakers close to microphone
Recording Environment
- Use a quiet room with minimal echo
- Close windows and turn off fans/AC during recording
- Use a quality external microphone instead of built-in laptop mics
- Use pop filters to reduce plosive sounds (p, b, t)
Content Guidelines
VideoToBe performs best with:
- Clear speech - No heavy accents or mumbling
- One speaker at a time - Overlapping speech reduces accuracy
- Minimal background music - Music interferes with speech recognition
- Standard dialects - Regional dialects may have lower accuracy
Expected Accuracy
With clear audio, VideoToBe achieves 95%+ accuracy. Noisy or poor-quality audio may result in 70-85% accuracy. See Features & Capabilities for more details.
Troubleshooting Upload Issues
File Size Exceeds Limit
If your file is too large:
- Compress the file using tools like HandBrake (video) or Audacity (audio)
- Split into smaller chunks - Divide long recordings
- Upgrade to Pro - Get 2GB file support
Upload Stuck at 0%
If your upload isn't progressing:
- Refresh the page and try again
- Check your internet connection
- Disable browser extensions (ad blockers)
- Try a different browser (Chrome or Firefox recommended)
- Use a wired connection instead of WiFi for large files
See Upload Errors for complete troubleshooting guide.
Multipart Upload for Large Files
For files larger than 5MB, VideoToBe uses multipart upload to improve reliability:
- Files are split into chunks and uploaded in parallel
- If one chunk fails, only that chunk is retried (not the entire file)
- Faster upload speeds for large files
- Better handling of network interruptions
This happens automatically - no action needed from you.
After Upload: What Happens Next?
- Upload to Cloud Storage: Your file is securely uploaded to Cloudflare R2
- Processing Queue: File is added to the transcription queue
- Audio Extraction: If video, audio is extracted from the video file
- Transcription: Whisper AI processes the audio and generates text
- Post-Processing: Timestamps, speaker labels, and formatting are applied
- Notification: You receive an email when transcription is complete
Typical processing time: 2-5 minutes for most files.