Subtitle delivery comes in two forms — burned into the video (hardcoded; rendered into pixels) or delivered as separate tracks (soft; player renders at playback). Each has appropriate use cases. Burn-in works everywhere; soft requires player support but adds toggle and language flexibility. Picking correctly affects pipeline cost, player compatibility, audience experience, and accessibility compliance. This page is the engineering reference.
What burn-in subtitles are
Burn-in (hardcoded) subtitles are rendered into the video frame during encoding. The result: subtitles are pixels, indistinguishable from the video itself.
Characteristics:
- Universal compatibility — every player decodes them.
- Always-on — viewer can't turn them off.
- Single language — only one subtitle track per video.
- Permanent — once burned in, they're part of the video.
- Encoded into bitrate — subtitle area in the frame consumes bitrate.
What soft subtitles are
Soft subtitles are separate tracks delivered alongside the video. Player renders them at playback time, overlaying on the video.
Characteristics:
- Player support required — not all players handle all formats.
- Toggleable — viewer can turn off, change language.
- Multiple languages — multiple subtitle tracks per video.
- Editable — subtitles can be updated without re-encoding video.
- Separate bandwidth — subtitle file is small (~KB) but adds to total delivery.
When burn-in is right
The cases where burn-in is the appropriate choice:
1. Audience needs subtitles always-on:
- Foreign-language films delivered to non-native-language audiences.
- Hearing-impaired audiences who always want captions.
- Some broadcast distribution where soft tracks aren't supported.
2. Player ecosystem doesn't support soft subtitles:
- Legacy players, embedded systems, some smart TVs from before ~2015.
- Custom players that don't implement subtitle handling.
- Streams to social media platforms that don't preserve sidecar subtitles (Instagram, TikTok, etc.).
3. Forced narrative (foreign-language passages):
- Most of the content is in viewer's language.
- Specific scenes are in foreign language and need translation regardless of viewer preference.
- These scenes get burned-in subtitles; rest of content has soft track.
4. Single delivery format:
- Pipeline produces single output without manifest infrastructure.
- Subtitle delivery would require manifest/track infrastructure not in place.
For these cases, burn-in is the right answer. Universal compatibility justifies the loss of toggle/language flexibility.
When soft is right
The cases where soft subtitles are the appropriate choice:
1. Multi-language content:
- Streaming services with global audiences.
- Multiple language tracks delivered as alternates.
- Per-viewer language selection at playback.
2. Modern streaming infrastructure:
- HLS or DASH delivery with manifest support.
- Players that handle WebVTT or IMSC.
- Web/mobile/smart TV ecosystems.
3. Toggle flexibility:
- Audience expects to control caption visibility.
- Premium streaming UX that respects viewer preferences.
4. Content that may need subtitle updates:
- Live content where subtitles are added/improved over time.
- Content where translation may be revised post-launch.
- DVR content where captions are retroactively added.
5. Accessibility compliance:
- ADA, CVAA, EAA accessibility regulations often expect toggleable captions.
- Soft subtitles satisfy "viewer can turn captions on/off" requirements.
For most premium streaming in 2026, soft subtitles are the right answer. The ecosystem supports them; audience expectations align with them.
ffmpeg burn-in implementation
ffmpeg burn-in via the subtitles filter:
ffmpeg -i input.mp4 -vf "subtitles=subs.srt" -c:a copy output.mp4
The subtitles filter accepts SRT, ASS, SSA, WebVTT, and other subtitle formats as input. The filter renders the subtitles into the video pixels.
For specific font and styling control, use ASS (Advanced SubStation Alpha) format with the filter:
ffmpeg -i input.mp4 -vf "subtitles=subs.ass:force_style='FontName=Arial,FontSize=24,Bold=1'" -c:a copy output.mp4
The force_style option overrides ASS styling.
For multi-line position control:
ffmpeg -i input.mp4 -vf "subtitles=subs.srt:force_style='Alignment=2,MarginV=50'" output.mp4
Alignment=2 is bottom-center; MarginV=50 is 50 pixels above bottom edge.
ffmpeg soft subtitle delivery
For HLS delivery with soft subtitles, ffmpeg produces WebVTT segments alongside video segments:
ffmpeg -i input.mp4 \
-map 0:v -map 0:a -map 0:s \
-c:v copy -c:a copy -c:s webvtt \
-hls_time 6 -hls_list_size 0 \
-f hls output.m3u8
The -c:s webvtt flag converts subtitles to WebVTT for HLS compatibility. The HLS muxer creates separate subtitle segments.
For DASH:
ffmpeg -i input.mp4 \
-map 0:v -map 0:a -map 0:s \
-c:v copy -c:a copy -c:s webvtt \
-seg_duration 6 \
-f dash output.mpd
Most production pipelines use Shaka Packager or similar packagers for sophisticated subtitle delivery rather than relying on ffmpeg's HLS/DASH muxers.
Bandwidth cost
Burn-in cost:
- Subtitle area in the frame consumes bitrate (text is detail; encoder spends bits on it).
- Typically 5-15% additional bitrate at the same quality vs no subtitles.
Soft subtitle cost:
- WebVTT file: ~1-5 KB per subtitle segment.
- For a 90-minute movie at 6-second segments, ~900 segments × 2-3 KB = ~2-3 MB total.
- Negligible compared to video bandwidth.
For pipelines, soft subtitles are cheaper bandwidth-wise. Burn-in spends bitrate on subtitle text every frame; soft delivers subtitles separately at low cost.
Mixed approach: burn-in + soft
Some pipelines combine both:
- Forced narrative burn-in: critical foreign-language passages always shown.
- Soft full subtitles: viewer can enable for full caption coverage.
The video has burn-in for forced sections; soft track has the same content (or more) for the audience that wants full subtitles.
This gives the best of both: critical passages always visible; viewer-controlled caption flexibility for the rest.
Player compatibility matrix
Soft subtitle support across players:
| Player | WebVTT | IMSC/TTML | SRT |
|---|---|---|---|
| AVPlayer (iOS/macOS) | ✓ | ✓ | Limited |
| hls.js | ✓ | Limited | Limited |
| Shaka Player | ✓ | ✓ | Limited |
| ExoPlayer (Android) | ✓ | ✓ | ✓ |
| dash.js | ✓ | ✓ | Limited |
| Older smart TVs | Variable | Variable | Variable |
| Embedded set-top boxes | Variable | Variable | Variable |
For broad compatibility, WebVTT is the safest choice. IMSC for premium streaming where richer styling is needed.
For audiences expected on legacy or embedded systems, burn-in is the safer choice.
Forced narrative tagging
Soft subtitles support forced narrative tagging — captions that should always display regardless of viewer preference (foreign-language passages, on-screen text).
WebVTT:
WEBVTT
00:00:00.000 --> 00:00:04.000
Regular subtitle (translation).
NOTE forced=true
00:00:05.000 --> 00:00:09.000
[Speaks Klingon] Forced narrative — always shown.
ffmpeg and most packagers preserve forced flags through processing. Players honor them.
For audience that should always see foreign-language translations regardless of caption preference, forced narrative tagging is the soft-subtitle answer to burn-in's "always on" property.
Operational considerations
Things that matter for subtitle delivery in production:
- Audience expectations — premium streaming expects toggleable captions; some markets expect always-on.
- Compliance requirements — ADA, CVAA, EAA regulations on caption availability and quality.
- Pipeline complexity — soft subtitles require manifest infrastructure; burn-in is simpler.
- Editorial workflow — soft subtitles can be revised post-launch; burn-in requires re-encoding.
- Player diversity — test on actual target players; subtitle handling has player-specific quirks.
- Multi-language scaling — soft subtitles scale linearly (one track per language); burn-in scales as multiple complete encodes (one per language).
Common subtitle delivery bugs
Things that go wrong:
Bug 1: Burn-in subtitle quality issues.
Subtitles burn-in at the source resolution; if the encoder encodes at lower resolution, subtitles look pixelated. Burn-in should happen at the highest output resolution; downstream scaling preserves quality.
Bug 2: Soft subtitle player rendering differences.
The same WebVTT file renders differently on different players — font choice, position, sizing. Test on actual targets before committing to specific styling.
Bug 3: Multi-language manifest signaling errors.
HLS or DASH manifests sometimes mis-identify languages or codec strings. Players don't show the language picker correctly. Validate manifest output.
Bug 4: Forced narrative not honored.
Players that don't recognize forced narrative tags don't show the always-on subtitles. Test on actual players; consider burn-in for content where forced narrative reliability matters.
Bug 5: Subtitle timing drift.
Long content with imprecise timing accumulates drift. By the end of a 90-minute movie, subtitles can be seconds off. Validate alignment throughout, not just at the start.
Bug 6: Special character encoding issues.
Non-Latin scripts (Japanese, Chinese, Arabic, Cyrillic) need correct UTF-8 throughout the pipeline. Some old tools mangle non-Latin content. Verify on representative content.
What MpegFlow does with subtitles
MpegFlow's DAG runtime expresses subtitle handling as discrete stages within the workflow. Soft delivery is the default — caption extraction (CcextractorExecutor) feeds caption-format conversion which feeds the HLS/DASH packaging stages; the partitioner persists each as a row in job_stages with explicit dependency tracking and per-stage retry. For burn-in, the FfmpegExecutor stage runs the subtitles filter as part of the encode (subtitle pixels rendered into the rendition); the workflow YAML selects burn-in vs soft per output.
For multi-language content, the pipeline produces per-language soft tracks as parallel sibling stages; the packaging stage emits manifest signaling that identifies each language for player selection. Cross-format consistency (the same content emitted as both WebVTT for HLS and IMSC for DASH) comes from a shared upstream source feeding both packaging branches.
Forced-narrative tagging is not currently a pipeline-native concept. The pipeline preserves WebVTT/IMSC source markers if they're present in the input caption file (passthrough), but it does not inject, validate, or normalize forced flags as a first-class operation. Customers needing reliable forced-narrative behavior either author it correctly upstream or use burn-in for the forced segments — pipeline-level forced-narrative authoring is on the backlog.
SCTE-35 markers in caption-adjacent content are passthrough-only at the muxing/packaging layer; the pipeline does not parse or generate SCTE-35 cue messages, and ad-marker–driven caption decisions are not a runtime feature.
The strict-broker security model handles subtitle content the same as any payload — workers carry no ambient credentials; content access is via short-lived presigned URLs scoped per stage; access is disposed on completion.
For customers building their first multi-language subtitle workflow, the conversation typically focuses on translation/localization workflow (out-of-pipeline; customer-managed), file format selection (WebVTT for web reach, IMSC for premium where supported), and forced-narrative handling (which scenes need always-on translation, and whether to author them as burn-in).
The general guidance: soft subtitles are the modern default for premium streaming. Burn-in is for specific cases (forced narrative, legacy compatibility, social media delivery). Match the delivery method to your audience expectations and accessibility requirements rather than choosing arbitrarily.