MpegFlowBlogBack to home
← Topics·Protocols

HLS X-TIMESTAMP-MAP — webvtt subtitle timing alignment for HLS

Practical reference on the HLS X-TIMESTAMP-MAP header — the MPEGTS:LOCAL syntax, why webvtt subtitles need it, common timing bugs, generation in pipelines.

ByMpegFlow Engineering Team·Protocols
·May 8, 2026·9 min read·1,813 words
In this topic
  1. What X-TIMESTAMP-MAP is
  2. The math
  3. Why X-TIMESTAMP-MAP exists
  4. Common configurations
  5. Common bugs
  6. Generation in pipelines
  7. Verification
  8. The fMP4 / CMAF subtitle case
  9. Cross-platform considerations
  10. Operational considerations
  11. A note on the timescale conversion math
  12. Multi-segment cue handling
  13. What MpegFlow does with X-TIMESTAMP-MAP

X-TIMESTAMP-MAP is one of the more obscure HLS headers and one of the most common sources of subtitle timing bugs in HLS pipelines. It's the header that tells WebVTT subtitle segments how to align with the video's media timeline. Get it wrong and subtitles appear at the wrong times relative to video; subtitles are off by hours at worst, milliseconds at best. This page is the engineering reference for what X-TIMESTAMP-MAP actually does and how to set it correctly.

#What X-TIMESTAMP-MAP is

X-TIMESTAMP-MAP is a header that appears INSIDE WebVTT subtitle files (specifically in HLS subtitle segments). Not in the manifest — in the subtitle segment file itself.

The format:

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:900000,LOCAL:00:00:00.000

00:00:01.000 --> 00:00:04.000
First subtitle text.

Two key fields:

  • MPEGTS:N — an MPEG-TS timestamp in 90 kHz units. This is where, in the video's media timeline, the LOCAL=00:00:00.000 in this WebVTT file actually maps to.
  • LOCAL:HH:MM:SS.mmm — the WebVTT-internal time that corresponds to MPEGTS. Typically 00:00:00.000.

The relationship: when the video is at MPEGTS time N (90 kHz units), display the WebVTT cue marked at LOCAL time. The cues themselves use their own LOCAL times relative to the LOCAL anchor.

#The math

A worked example:

  • Video stream is in MPEG-TS with PTS values starting at 900,000 (90 kHz units = 10 seconds in).
  • WebVTT subtitles are authored relative to LOCAL time 0.
  • First subtitle cue is at LOCAL 00:00:01.000 (one second after subtitle file start).
  • We want the subtitle to display when video is at PTS 990,000 (11 seconds in MPEG-TS terms).

X-TIMESTAMP-MAP setting:

X-TIMESTAMP-MAP=MPEGTS:900000,LOCAL:00:00:00.000

Player reads this and computes: subtitle at LOCAL 00:00:01.000 = MPEGTS 900,000 + 1 second × 90,000 = MPEGTS 990,000. Display at video PTS 990,000.

The mapping is "anchor LOCAL to MPEGTS at the offset values given; cue times are relative to the LOCAL anchor."

#Why X-TIMESTAMP-MAP exists

WebVTT is a generic subtitle format. WebVTT cues use absolute timestamps (HH:MM:SS.mmm) starting from 00:00:00.000.

HLS video segments use MPEG-TS internal timestamps. A live stream might start at PTS 900,000 (already 10 seconds in by the time someone tunes in). A VOD asset might start at any MPEG-TS time depending on how it was packaged.

The mismatch: WebVTT cues are relative to local zero; HLS video timestamps are absolute MPEG-TS time. X-TIMESTAMP-MAP is the bridge.

For pipelines that produce subtitle segments perfectly aligned with video segments (subtitle segment 1 covers the same time range as video segment 1), the subtitle's LOCAL=0 maps to the video's PTS at the start of that segment. X-TIMESTAMP-MAP encodes exactly that mapping.

#Common configurations

Standard live HLS (subtitle segment per video segment):

For each subtitle segment, X-TIMESTAMP-MAP points to the start PTS of the corresponding video segment:

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:1800000,LOCAL:00:00:00.000

00:00:00.500 --> 00:00:03.500
Subtitle for this 4-second segment.

If the corresponding video segment starts at MPEG-TS time 1,800,000 (20 seconds into the stream), the subtitle cue at LOCAL 00:00:00.500 displays at MPEG-TS 1,845,000 (20.5 seconds in).

VOD HLS with subtitles aligned to media zero:

For VOD where subtitle file 0 maps to video time 0:

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:0,LOCAL:00:00:00.000

00:00:01.000 --> 00:00:05.000
First subtitle in the VOD.

This is the simplest case — subtitle LOCAL time equals video media time directly.

#Common bugs

Things that go wrong with X-TIMESTAMP-MAP:

Bug 1: Missing X-TIMESTAMP-MAP

If a WebVTT subtitle segment in HLS has no X-TIMESTAMP-MAP, players assume LOCAL maps to media time zero. For VOD this works. For live (where MPEG-TS starts somewhere non-zero), subtitles appear at wrong times.

Bug 2: Wrong MPEGTS value

If MPEGTS doesn't match the actual video PTS at LOCAL=0, subtitles drift. If MPEGTS is too large, subtitles appear early. Too small, late.

Bug 3: Per-segment MPEGTS not updated

Each subtitle segment should have its X-TIMESTAMP-MAP updated to match the corresponding video segment's start PTS. If pipelines reuse a constant MPEGTS across segments, only the first segment's subtitles are correctly timed.

Bug 4: Mismatched timescales

MPEGTS is in 90 kHz units. A pipeline that accidentally uses video PTS in a different timescale (e.g., raw seconds × 1000) produces garbage MPEGTS values.

Bug 5: Live stream wraparound

MPEG-TS PTS wraps every ~26.5 hours (33-bit PTS). For long-running live streams, X-TIMESTAMP-MAP must handle wraparound correctly.

Bug 6: Cue boundary alignment

Subtitle cues that span multiple subtitle segments need careful handling. The cue start in segment N might extend into segment N+1; both segments should reference the same cue. Some pipelines duplicate; others split.

#Generation in pipelines

For pipeline subtitle segment generation:

  1. Determine the corresponding video segment's start PTS in MPEG-TS units (90 kHz).
  2. Identify subtitle cues that should appear during this segment's time range.
  3. Compute LOCAL times for each cue relative to the segment's start.
  4. Write the WebVTT segment with X-TIMESTAMP-MAP set to the video segment's start PTS.

ffmpeg-based pipeline (using -copyts for timestamp preservation):

ffmpeg -copyts -i source_with_subs.mp4 -c:v copy -c:a copy \
  -c:s webvtt -segment_time 6 -hls_segment_type mpegts \
  -hls_flags independent_segments output.m3u8

The exact behavior depends on ffmpeg version and configuration. For complex subtitle pipelines, a dedicated packager (Shaka Packager, custom code) is usually preferred.

For Shaka Packager:

packager input=source.mp4,stream=text,segment_template=subs/seg-$Number$.vtt

Shaka Packager handles X-TIMESTAMP-MAP generation correctly per segment.

#Verification

To verify X-TIMESTAMP-MAP correctness:

Check the header values:

head -2 subs/seg-001.vtt

Should show WEBVTT followed by X-TIMESTAMP-MAP=MPEGTS:N,LOCAL:00:00:00.000.

Verify alignment with corresponding video segment:

ffprobe -v error -show_entries packet=pts video/seg-001.ts | head

Compare the first PTS value in the video segment with the MPEGTS value in the subtitle's X-TIMESTAMP-MAP. They should match.

End-to-end test:

Play the HLS stream in hls.js or a test player. Confirm subtitles appear at correct times relative to video.

#The fMP4 / CMAF subtitle case

For CMAF-based HLS (fMP4 subtitle segments rather than WebVTT):

  • Subtitle data is carried in fMP4 segments using subtitle handler (typically subt).
  • Timing uses fMP4's tfdt model rather than X-TIMESTAMP-MAP.
  • X-TIMESTAMP-MAP doesn't apply.

This is the modern approach; X-TIMESTAMP-MAP applies specifically to WebVTT-as-text-segments in HLS. CMAF replaces this awkwardness with native fMP4 timing.

For pipelines transitioning from legacy WebVTT-text-segments to CMAF subtitles, X-TIMESTAMP-MAP becomes irrelevant for the new path.

#Cross-platform considerations

X-TIMESTAMP-MAP support across platforms:

  • iOS / macOS Safari (AVPlayer) — full support since iOS 9 / macOS 10.10. The de facto reference implementation.
  • hls.js — full support since v0.x. Major HLS player on the web.
  • Shaka Player — full support.
  • ExoPlayer (Android) — full support.
  • Older smart TVs (pre-2018) — variable; some require X-TIMESTAMP-MAP, others ignore it.

For modern players, X-TIMESTAMP-MAP works correctly. The bugs are usually pipeline-side (incorrect generation), not player-side.

#Operational considerations

Things that matter for X-TIMESTAMP-MAP in production:

  • Per-segment generation correctness — each subtitle segment's X-TIMESTAMP-MAP must match its corresponding video segment's start PTS.
  • Live PTS evolution — for live streams, PTS values evolve continuously. Pipeline logic must track current video PTS and update subtitle X-TIMESTAMP-MAP per segment.
  • Multi-language consistency — all language subtitle tracks should have aligned X-TIMESTAMP-MAP values for their corresponding segments.
  • Tooling automation — manually computing X-TIMESTAMP-MAP per segment is error-prone. Use packagers that handle it automatically.
  • Verification automation — pipeline QC should verify X-TIMESTAMP-MAP correctness on output. Misalignment is silent until a viewer notices.
  • CMAF migration path — for new pipelines, CMAF subtitles avoid X-TIMESTAMP-MAP entirely. If you're greenfield, consider CMAF subtitles.

#A note on the timescale conversion math

The 90 kHz timescale in MPEG-TS is a historical quirk worth explicit mention for pipeline engineers. The number 90,000 = 30 (NTSC frame rate) × 3,000 (ticks per frame). This dates from 1990s MPEG decisions optimizing for NTSC video carriage.

The implication for X-TIMESTAMP-MAP: every time you compute MPEGTS values, you're working in 90 kHz units. If your pipeline internally uses milliseconds, microseconds, or other units, the conversion to 90 kHz must be exact. Common mistakes:

  • Using milliseconds × 90 instead of × 90 (off by 1000x).
  • Rounding incorrectly when the source time has fractional milliseconds.
  • Mixing 90 kHz (PCR/PTS) with 27 MHz (PCR with extension) — they're different units.

For pipeline implementations, encapsulate the conversion in a helper function that's tested with edge cases (sub-frame times, large times near wraparound, exact multiples vs fractional).

#Multi-segment cue handling

A subtle case: a subtitle cue that should display from time A to time B might span multiple subtitle segments if the cue duration exceeds segment duration. For example:

  • Subtitle cue: "This is a long subtitle" displayed from 00:00:18.000 to 00:00:24.000 (6 seconds).
  • Subtitle segments: 4 seconds each, segment N covers 16-20s, segment N+1 covers 20-24s.

The cue spans both segments. Two approaches:

  1. Duplicate the cue in both segments. Each segment shows the relevant portion. Risk: rendering glitches at boundaries.
  2. Split the cue into two cues at the segment boundary. Each segment has its own complete cue.

For HLS WebVTT-text segments, option 2 is more common. Each segment is self-contained with its own X-TIMESTAMP-MAP and self-contained cues.

#What MpegFlow does with X-TIMESTAMP-MAP

X-TIMESTAMP-MAP control is not currently a customer-facing knob in MpegFlow's pipeline. The HLS packaging stage runs on an FfmpegExecutor worker and emits whatever X-TIMESTAMP-MAP the FFmpeg HLS muxer produces from the source's PTS. There is no pipeline-level configuration that lets a workflow override or normalize the MPEGTS/LOCAL values per segment.

For customers with strict X-TIMESTAMP-MAP requirements (specific player targets, legacy WebVTT-text segment compatibility with non-default offsets, edge cases in muxer behavior), the workaround today is post-processing the WebVTT output in their own tooling outside MpegFlow's pipeline boundary. Native pipeline-level control over X-TIMESTAMP-MAP synthesis is on the backlog.

For the broader Phase 2D conversation: dedicated packagers (Shaka Packager) typically expose finer-grained control over subtitle-segment timing semantics. Phase 2D / Shaka Packager integration is roadmap, not currently shipped — that's the timeline on which deeper subtitle-timing controls would land.

For modern pipelines using CMAF subtitles, X-TIMESTAMP-MAP isn't applicable; fMP4 subtitle segments use native fMP4 timing. Migrating from legacy WebVTT-text segments to CMAF subtitles sidesteps the X-TIMESTAMP-MAP issue entirely, which is part of why we recommend it where the player ecosystem permits.

The strict-broker security model handles subtitle segments the same as video segments — workers carry no ambient credentials; content access flows through short-lived presigned URLs scoped per stage; access is disposed on completion.

For customers debugging subtitle timing issues in production today, the standing recommendation: verify X-TIMESTAMP-MAP per-segment in the emitted output, identify whether the issue is muxer behavior or upstream PTS misalignment, and either post-process or migrate to CMAF subtitles depending on the root cause.

The general guidance: X-TIMESTAMP-MAP is precise but not arcane. Most "subtitles wrong time" bugs trace to MPEGTS values that don't match the corresponding video segment's PTS. If you're still on WebVTT-text segments and your player ecosystem supports CMAF subtitles, migrating sidesteps the whole class of issues.

Tags
  • hls
  • webvtt
  • x-timestamp-map
  • subtitles
  • protocols
  • timing
See also

Related topics and reading

  • WebVTT — the W3C caption format every browser speaks
  • HLS segment duration — picking the right TARGETDURATION for your use case
  • CMAF — the segment format that ended the HLS-vs-DASH duplicate-encoding problem
Building on this?

Join the MpegFlow beta.

We're shipping the encoder MVP this quarter. If you're wrangling protocols in production, the beta is built for you — no card, no console waiting.

Join the beta More protocols
© 2026 MpegFlow, Inc. · Trust & complianceAll systems nominal·StatusPrivacy