HLS segment duration — picking the right TARGETDURATION for your use case

MpegFlow

Practical guide to HLS segment duration selection — Apple's 6-second recommendation, GOP alignment, latency vs request overhead tradeoff, LL-HLS partial segment relationship.

HLS segment duration is one of the most consequential single configuration choices in HLS pipeline design. It affects live latency, bandwidth efficiency, manifest size, ABR adaptation behavior, and CDN caching effectiveness. Apple's recommendation is 6 seconds; many pipelines use 4 seconds; some live workflows go to 2 seconds. This page is the engineering reference for what segment duration actually does and how to pick the right value for your pipeline.

What segment duration is

In HLS, content is split into segments — each segment is an HTTP-fetchable file containing a few seconds of media. The HLS media playlist declares each segment's duration with #EXTINF: tags:

#EXTM3U
#EXT-X-VERSION:6
#EXT-X-TARGETDURATION:6
#EXTINF:6.000,
seg-001.ts
#EXTINF:6.000,
seg-002.ts
#EXTINF:6.000,
seg-003.ts

The #EXT-X-TARGETDURATION declares the maximum segment duration in the playlist. The actual #EXTINF durations should be at or below this value. Apple's spec requires segments to not exceed TARGETDURATION + 0.5 seconds.

For most production HLS, segments are produced at fixed durations (every segment is 4 or 6 seconds). Some pipelines produce variable durations based on GOP boundaries.

Apple's recommendation

Apple's HLS authoring guide (current as of 2026) recommends 6-second segments for VOD. The rationale:

Long enough that HTTP request overhead is amortized.
Short enough for reasonable live latency (15-30s end-to-end).
Compatible with player buffer requirements (typical 3 segments minimum = 18s buffer).
CDN-cache-friendly (large segments benefit from CDN caching).
Manifest size manageable (60-minute content = 600 segments at 6s, manifest ~50 KB).

Apple's media stream validator flags segment durations outside the 4-10 second range as warnings. For live, 6-second is the safe production default.

Latency implications

For live streaming, segment duration affects latency directly:

Encoder produces complete segment — segment duration must elapse before the file is ready.
Manifest update — the new segment is added to the manifest after it's complete.
Player polls / reads manifest — typically every 1× target duration on average.
Player buffer — 3 segments minimum buffer = 3× segment duration.

End-to-end latency floor for standard HLS:

latency ≈ segment_duration × (1 + buffer_segments) + manifest_polling_avg

For 6-second segments with 3-segment buffer:

latency ≈ 6 × 4 + 3 = 27 seconds

For 4-second segments with 3-segment buffer:

latency ≈ 4 × 4 + 2 = 18 seconds

For 2-second segments with 3-segment buffer:

latency ≈ 2 × 4 + 1 = 9 seconds

Shorter segments = lower latency. The trade-off: more HTTP requests, larger manifest, less CDN cache benefit.

GOP alignment requirement

HLS segments must start at IDR keyframes (closed GOP boundaries). This is a hard constraint — the segment boundary IS a keyframe boundary.

The relationship: segment duration must be an integer multiple of GOP duration.

For 2-second GOPs, valid segment durations are: 2s, 4s, 6s, 8s, 10s, 12s, etc.

For 1-second GOPs, valid segment durations are: 1s, 2s, 3s, 4s, etc.

Mismatched GOP and segment duration breaks ABR adaptation — segments don't start at keyframes, and player switching between variants doesn't work cleanly.

The typical 2026 production setup:

2-second GOPs (low-latency pipelines: 1-second).
4-second segments (low-latency live) or 6-second segments (VOD baseline).

For live encoding, force keyframes at exact intervals matching the segment duration:

ffmpeg -force_key_frames "expr:gte(t,n_forced*4)" ... segment_duration=4

This guarantees keyframes at every multiple of 4 seconds, enabling 4-second segments.

Bandwidth and request overhead

Per-segment HTTP overhead:

TCP/HTTP handshake: ~10-50 ms per request.
HTTP header overhead: ~500-2000 bytes per response.
Connection pooling: reduces but doesn't eliminate overhead.

For a 60-minute stream:

6-second segments: 600 requests, ~120 KB header overhead, ~6 seconds of cumulative request latency.
4-second segments: 900 requests, ~180 KB overhead, ~9 seconds latency.
2-second segments: 1800 requests, ~360 KB overhead, ~18 seconds latency.
1-second segments: 3600 requests, ~720 KB overhead, ~36 seconds latency.

For VOD playback over HTTP/2 with persistent connections, the overhead is amortized and not catastrophic. For live streaming over flaky networks, fewer requests = better resilience.

Manifest size

Live HLS manifests grow with segment count. For a sliding-window live manifest of, say, 10 minutes:

6-second segments: 100 segments × ~80 bytes per entry = ~8 KB manifest.
2-second segments: 300 segments × ~80 bytes per entry = ~24 KB manifest.

For very long live streams or DVR-style "all segments retained" manifests:

24-hour DVR with 6-second segments: 14,400 segments = ~1.1 MB manifest.
24-hour DVR with 2-second segments: 43,200 segments = ~3.4 MB manifest.

Manifest fetched on every player update. Size matters for live polling behavior.

For LL-HLS, manifest size is even more important because manifest is fetched more frequently (potentially every part). Smaller is better for LL-HLS.

CDN caching efficiency

Larger segments are more CDN-cache-friendly:

Each segment is one cacheable unit.
Cache hit ratio is per-segment.
Smaller segments = more cache entries = higher cache memory pressure.
Smaller segments = smaller per-cache-hit benefit.

For VOD with many viewers per asset, cache benefits compound. 6-second segments cache better than 2-second segments at scale.

For live with manifest sliding window, segment lifetime is short anyway. The cache benefit difference between segment durations is smaller for live.

LL-HLS partial-segment relationship

LL-HLS splits segments into smaller "parts" (typically 200-500 ms each). The relationship to segment duration:

Standard segment: 6 seconds = 1 segment.
Partial segments: 6 seconds = 18 parts at 333 ms each.
LL-HLS players consume parts as they arrive; segments are "complete" when all their parts are loaded.

Segment duration in LL-HLS is the same as standard HLS; partial segments are an additional layer within. The choice:

6-second segments + 333 ms parts = standard LL-HLS.
4-second segments + 200 ms parts = aggressive LL-HLS, lower latency.
2-second segments + 100 ms parts = ultra-aggressive LL-HLS, highest CDN load.

For most LL-HLS deployments, 6-second segments with 333 ms parts is the default.

Variable segment durations

Some pipelines produce variable segment durations to align with content boundaries:

Scene-change-aligned segments (segment ends at major scene transitions).
GOP-aligned segments where GOP size varies (encoder chooses GOP based on scene complexity).
Ad-break-aligned segments (segment ends at ad insertion points).

Variable durations work in HLS — each segment's #EXTINF declares its individual duration. The TARGETDURATION must be at least the max segment duration.

The trade-off: variable durations are more complex to package and harder for players to handle. Most pipelines use fixed durations for operational simplicity.

Choosing for your use case

Standard VOD (no latency concern): 6-second segments. Apple recommendation; CDN-friendly; player default expectations.

Standard live (15-45s latency acceptable): 6-second segments. Same as VOD; lowest operational complexity.

Low-latency live (target 10-15s): 4-second segments. Modest reduction in latency at modest cost in HTTP overhead.

LL-HLS (target 2-3s): 4-6 second segments with 200-500 ms partial segments. Standard configuration.

Premium VOD with seek-heavy use cases: 4-second segments. Smaller segments = faster seek granularity (less to discard when seeking past a GOP).

Mobile-only delivery: 6-second segments. Mobile networks benefit from fewer requests; CDN cache benefit matters.

Bandwidth-constrained delivery: 6+ second segments. Minimize HTTP overhead.

Long-form DVR live: 6-second segments. Smaller manifest size matters.

For most use cases, 6-second segments are the safe default. Deviate when specific latency, throughput, or operational requirements justify it.

Operational considerations

Things that matter for segment duration in production:

GOP-segment alignment — verify GOP duration divides segment duration evenly.
Force keyframe alignment — use -force_key_frames to ensure keyframes at exact segment boundaries.
Cross-variant alignment — all ABR variants should share segment boundaries. Misaligned variants break adaptation.
Apple Media Stream Validator — for production HLS, validate against Apple's spec. Segment duration warnings are easy to fix.
Manifest size monitoring — for live streams with long lookbacks, watch manifest size growth.
Segment duration drift — encoder timing imprecision can cause segments to be 6.001s or 5.998s. Ensure TARGETDURATION accommodates the actual max.

What MpegFlow does with HLS segment duration

MpegFlow's DAG runtime configures HLS segment duration as a workflow-level parameter that flows into the FfmpegExecutor stages handling encode (for keyframe placement) and the downstream HLS packaging stage. The partitioner persists each stage to job_stages; cross-stage data flow ensures the keyframe cadence chosen at encode-time matches the segment boundaries the packager produces. The default for VOD and standard live is a 6-second target; LL-HLS workflows configure shorter targets where the underlying tooling supports it.

The encode stage uses force-key-frames (or equivalent encoder flags) to align GOP boundaries with segment boundaries automatically. Cross-variant alignment is enforced through consistent keyframe configuration across the parallel rendition stages — every rendition's FfmpegExecutor invocation receives the same keyframe spec, so all renditions share segment boundaries and ABR adaptation stays clean.

Production-grade LL-HLS partial-segment handling is constrained by what the FFmpeg HLS muxer exposes. Tight low-latency targets that need dedicated-packager features (precise partial-segment generation, advanced manifest dialects) are part of the Phase 2D conversation about Shaka Packager integration, which is roadmap, not currently shipped.

The strict-broker security model handles segment duration as workflow configuration; workers carry no ambient credentials and operate per spec.

For customers tuning segment duration for their pipeline, the standing recommendation: 6-second segments for VOD and standard live; 4-second segments for low-latency live where current packaging suffices; consult on Phase 2D timeline when targets demand dedicated-packager partial-segment behavior.

The general guidance: segment duration is a structural decision worth thinking through, not a parameter to tune. The defaults work for most use cases; deviation is for specific latency or throughput requirements with measured benefit.