TTML — Timed Text Markup Language — is the W3C XML-based caption format designed for rich styling, precise positioning, and broadcast-grade caption delivery. IMSC (Internet Media Subtitles and Captions) is a constrained profile of TTML that's the standard for streaming caption delivery in DASH and broadcast contexts. Where WebVTT is the simple, browser-native caption format, TTML/IMSC is the heavy-duty option for premium streaming and broadcast workflows. This page is the engineering reference.
What TTML is
TTML (also called DFXP — Distribution Format Exchange Profile) is a W3C standard for timed text. The first version was published in 2010; current production uses TTML 2 (W3C Recommendation, 2018) plus IMSC profile constraints.
TTML is XML-based. A simple TTML file:
<?xml version="1.0" encoding="UTF-8"?>
<tt xmlns="http://www.w3.org/ns/ttml"
xmlns:tts="http://www.w3.org/ns/ttml#styling"
xml:lang="en">
<head>
<styling>
<style xml:id="defaultStyle" tts:fontFamily="proportionalSansSerif"
tts:fontSize="18px" tts:color="white" tts:backgroundColor="black"/>
</styling>
<layout>
<region xml:id="bottom" tts:origin="20% 80%" tts:extent="60% 20%"
tts:displayAlign="center"/>
</layout>
</head>
<body>
<div>
<p style="defaultStyle" region="bottom" begin="00:00:00.000" end="00:00:04.000">
Welcome to the engineering reference.
</p>
<p style="defaultStyle" region="bottom" begin="00:00:04.500" end="00:00:08.000">
This is a sample TTML file.
</p>
</div>
</body>
</tt>
The structure: header (with <styling> for CSS-like style definitions and <layout> for region definitions), then body with <p> paragraphs containing the caption text and timing.
The XML format is more complex than WebVTT's text format but supports significantly more capability. Styling can be sophisticated (CSS-like with rich typography, animations, colors, effects); positioning is pixel-precise; metadata is structured.
IMSC — the streaming profile
TTML's full spec is too permissive for interoperable production use. IMSC (Internet Media Subtitles and Captions) is a constrained profile that defines exactly which TTML features are supported. Two relevant IMSC profiles:
- IMSC 1.x (Text Profile) — text-based captions with constrained styling. Lightweight, good for streaming.
- IMSC 1.x (Image Profile) — captions delivered as images (PNG bitmaps in the timed text container). Used for character-based languages (Japanese, Chinese), scripts that don't render well in standard fonts, or when exact visual reproduction is critical.
IMSC 1.0.1 is the version most production deployments use. IMSC 1.1 added some additional capabilities. IMSC 2.0 (W3C Candidate Recommendation) extends further; production adoption is still rolling out.
Major streaming services and broadcasters that use IMSC:
- Netflix — IMSC 1.0.1 is the primary subtitle format internally; converted to other formats per delivery target.
- BBC — IMSC for iPlayer streaming.
- HbbTV — IMSC is the standardized caption format.
- DASH-IF — IMSC is the recommended caption format for DASH delivery.
For premium streaming pipelines that need broadcast-grade caption capability, IMSC is the answer.
TTML/IMSC capabilities
What makes TTML/IMSC capable beyond WebVTT:
Rich styling:
- Full color control (foreground, background, edge, shadow).
- Text decoration (underline, strikethrough, overline).
- Font family with fallback chains.
- Font weight and italic with precise control.
- Outline and edge effects (essential for captions over video to ensure readability).
- Animation support (fades, scrolls).
Precise positioning:
- Pixel-precise region positioning.
- Multi-region layouts for complex caption arrangements.
- Per-cue region overrides.
- Vertical text orientation (CJK).
Multilingual handling:
- Per-paragraph language tags.
- Text directionality (left-to-right, right-to-left).
- Bidirectional text mixing.
- Ruby annotations (CJK furigana).
Metadata:
- Per-cue speaker identification.
- Sound effect descriptions for accessibility.
- Cue role attribution (caption, subtitle, description).
- Forced-narrative flagging.
The depth of capability matters most for accessibility-critical content (where the spec demands precise control) and broadcast workflows (where regulatory or contractual requirements specify exact rendering).
TTML/IMSC in DASH
DASH carries IMSC subtitles via AdaptationSets:
<AdaptationSet mimeType="application/ttml+xml" lang="en" id="3">
<Representation id="ttml-en" bandwidth="0">
<BaseURL>subs/en.xml</BaseURL>
</Representation>
</AdaptationSet>
For segmented IMSC (live or long-form content), the subtitles are delivered as fragmented MP4 segments containing IMSC payloads:
<AdaptationSet mimeType="application/mp4" codecs="stpp.ttml.im1t" lang="en" id="3">
<SegmentTemplate ... media="subs/en/seg-$Number$.m4s" .../>
</AdaptationSet>
The codecs="stpp.ttml.im1t" identifier signals IMSC 1.0.1 Text Profile. Players parse the IMSC payload from each segment.
DASH-IF interoperability tests cover IMSC delivery; major DASH players (dash.js, Shaka Player, ExoPlayer) handle IMSC correctly.
TTML/IMSC in HLS
HLS support for IMSC is more limited than DASH. The HLS spec primarily uses WebVTT for subtitles. IMSC in HLS is possible via:
- fMP4 IMSC segments — same fragmented MP4 IMSC delivery as DASH, referenced from HLS subtitle media playlists.
- Native iOS/tvOS support — Apple's AVPlayer supports IMSC playback in HLS.
For HLS-only delivery to non-Apple devices, WebVTT is more universally supported. For HLS delivery that includes Apple devices (especially tvOS, where IMSC is more polished), IMSC works alongside WebVTT.
Most pipelines that need IMSC ship it primarily for DASH delivery, with WebVTT as the HLS subtitle format. The dual-format approach is operational reality for premium streaming.
TTML/IMSC vs WebVTT
The strategic comparison:
| Dimension | WebVTT | TTML/IMSC |
|---|---|---|
| Format | Text-based | XML |
| Styling capability | Modest (inline tags + basic CSS) | Rich (full CSS-like, animations) |
| Positioning capability | Modest (cue settings) | Precise (pixel-level regions) |
| Browser support | Native | Via JavaScript polyfill |
| HLS support | Native | Limited (Apple ecosystem primarily) |
| DASH support | Yes via mimeType | Native (recommended by DASH-IF) |
| Use cases | Web streaming, mass-market | Premium streaming, broadcast |
| File size | Smaller | Larger (XML overhead) |
| Accessibility features | Basic | Rich (multiple roles, descriptions, etc.) |
For most consumer streaming, WebVTT is sufficient. For premium streaming and broadcast, IMSC adds capability that matters.
The pragmatic 2026 answer for premium streaming services: ship IMSC for DASH delivery; ship WebVTT for HLS delivery; convert from one master format internally. Many services author in IMSC (richer source format) and convert to WebVTT for HLS.
When IMSC is required
The cases where IMSC is mandatory or strongly preferred:
- Broadcast delivery (DVB, HbbTV) — IMSC is the standardized caption format for these ecosystems.
- Premium streaming (Netflix-tier) — IMSC's capability matters for accessibility, multi-language support, and rendering consistency across devices.
- Accessibility-critical content — IMSC's role/description support is more developed than WebVTT's.
- CJK content — Image profile IMSC handles Japanese, Chinese, Korean text rendering more reliably than relying on player font support.
- Content with extensive on-screen text — when captions need to coexist with foreground text, precise positioning matters.
When WebVTT is sufficient
The cases where WebVTT is the right answer:
- Web-first streaming — browser native support is the lowest-friction path.
- Mass-market content — basic captions for general audience.
- HLS-only delivery — HLS's ecosystem assumes WebVTT.
- Operational simplicity — WebVTT's text format is easier to author, debug, and version-control.
- Mobile-first delivery — WebVTT renders well on mobile players.
For most consumer streaming pipelines, WebVTT is sufficient. IMSC is the tool when capability constraints become real.
Authoring and conversion
IMSC content is typically authored in subtitle production tools (EZTitles, WinCaps, Subtitle Edit, professional captioning workstations). The output is XML files conforming to the IMSC profile.
Conversion between IMSC and WebVTT loses information:
- IMSC → WebVTT: rich styling reduces to inline tags; precise positioning approximates to cue settings; metadata not directly representable is dropped.
- WebVTT → IMSC: the conversion is mechanical (WebVTT capabilities are a subset of IMSC's).
For pipelines that produce both formats, authoring in IMSC and converting to WebVTT preserves the most capability. Authoring in WebVTT and converting to IMSC works but doesn't gain you IMSC's richer capabilities.
Operational considerations
Things that matter for production IMSC:
- Validation against IMSC profile — TTML-permissive content may not validate as IMSC. Use IMSC validators to ensure output conforms.
- Player compatibility testing — IMSC players exist but vary in implementation completeness. Test on actual target devices.
- Font handling — IMSC font specifications need fallback chains because not all players have all fonts.
- Image profile delivery size — image profile IMSC produces larger files than text profile due to embedded PNG images. Plan delivery bandwidth accordingly.
- Live IMSC — segmented IMSC delivery for live works but is more complex than live WebVTT. Few production pipelines have done this at scale.
- Forced-narrative subtitles — IMSC supports explicit forced-narrative metadata; ensure the player respects it for content with foreign-language dialogue.
What MpegFlow does with TTML/IMSC
MpegFlow's DAG runtime expresses IMSC handling as discrete stages. The caption-conversion node runs on an FfmpegExecutor worker via the libavfilter caption path (CaptionFormat::Smptett for SMPTE-TT/IMSC variants), with the partitioner persisting each stage to job_stages and dependency tracking ensuring downstream packaging stages wait for upstream caption emission. Per-stage retry handles transient failures.
For pipelines requiring both IMSC and WebVTT (typical premium streaming setup), the workflow runs parallel sibling caption-conversion stages from the same upstream source — one emitting IMSC for DASH packaging, one emitting WebVTT for HLS packaging. Cross-stage data flow wires each into its respective packager; sibling cancellation propagates if a fatal upstream failure invalidates the dependent encodes.
For broadcast workflows requiring IMSC delivery (DVB, HbbTV), the IMSC output is what the underlying tooling produces from the source. Editorial review for out-of-spec content (forced-narrative semantics, region-specific markup) is operator-side work today rather than a pipeline-native gate — there's no decision node in the DAG runtime that pauses on review and resumes on operator approval.
The strict-broker security model handles IMSC content like any pipeline payload — workers carry no ambient credentials; content access flows through short-lived presigned URLs scoped per stage; access is disposed on completion. Encryption isn't typically applied to caption tracks but is supported when required.
For customers building their first IMSC-capable workflow, the conversation focuses on authoring tools (do you have IMSC-capable subtitle production?), conversion strategy (single-master IMSC vs separate IMSC+WebVTT masters), and target distribution requirements (broadcast vs streaming vs both). The pipeline side is solved; the editorial integration with subtitle production tooling is where customer-specific work happens.