CMAF — the segment format that ended the HLS-vs-DASH duplicate-encoding problem

MpegFlow

Practical reference on Common Media Application Format (CMAF) — fragmented MP4 structure, common encryption (CENC), unified HLS+DASH delivery, chunked CMAF for low-latency, and operational benefits.

CMAF — Common Media Application Format, ISO/IEC 23000-19 — is the segment format that finally let streaming services stop encoding twice. Before CMAF, HLS used MPEG-2 Transport Stream segments and DASH used fragmented MP4 segments, so a pipeline serving both protocols encoded and packaged each set of media twice. CMAF standardizes a single fragmented MP4 segment format that both HLS modern and DASH consume. Same segments, two manifests, one set of files in storage. This is the operational unlock that makes 2026 streaming pipelines tractable.

This page is the engineering reference: what CMAF is, how it differs from earlier fMP4 implementations, how Common Encryption rides on top of it, and how chunked CMAF enables low-latency.

What CMAF is

CMAF is a fragmented MP4 specification — fMP4, the same container family used by DASH since the start. The "Common" in the name signals the cross-protocol intent: same segments serve HLS and DASH players. Apple added CMAF support to HLS in 2017 (iOS 10+, tvOS 10+, macOS 10.12+); DASH consumed fMP4 from the beginning.

Structurally, a CMAF segment is an fMP4 file with:

Initialization segment — init.mp4 containing ftyp and moov boxes. Codec parameters, dimensions, timescales. Loaded once per variant.
Media segments — sequence of .m4s files containing moof (movie fragment) + mdat (media data) box pairs. The actual encoded media.

A typical layout for a single variant:

1080p/
  init.mp4
  seg-00001.m4s
  seg-00002.m4s
  seg-00003.m4s
  ...

Each .m4s is a self-contained fragment with its own moof + mdat. Players concatenate the init segment with media segments to play.

What's "common" — the spec discipline

CMAF's value isn't that it's a new container. fMP4 already existed. CMAF's value is the constraints the spec imposes:

Single track per segment — segments contain exactly one media track (one video, one audio, or one subtitle). HLS expects this; DASH allows multiplexed segments. CMAF enforces single-track for cross-protocol compatibility.
Defined timing structure — the timescale, base media decode time, and fragment durations are constrained to ensure both HLS and DASH players parse them consistently.
Defined boxes — the spec specifies exactly which fMP4 boxes are required, optional, or prohibited. No vendor extensions that would break cross-player compatibility.
Defined encryption format — Common Encryption (CENC) profile defines the exact AES mode, IV format, and key identifier scheme used in CMAF segments.

The constraints sound minor; they're the difference between "fMP4 segments" (theoretically cross-protocol) and "CMAF segments" (actually cross-protocol).

Common Encryption (CENC)

CENC, formally ISO/IEC 23001-7, is the encryption scheme CMAF uses. It defines two modes:

AES-CTR — counter mode AES. Used by Widevine and PlayReady.
AES-CBC — cipher block chaining AES, defined as cbcs mode in CENC. Used by FairPlay.

The key insight: with CMAF + CBC mode (cbcs), the same encrypted segments are decryptable by all three major DRM systems. FairPlay decrypts via the HLS manifest, Widevine and PlayReady decrypt via the DASH manifest, all using their own license-delivery flows but consuming the same encrypted bytes.

This is why CMAF won the operational battle. Pre-CMAF, multi-DRM meant multiple encryption passes — encode in cleartext, encrypt twice (once with CTR for Widevine/PlayReady, once with CBC for FairPlay), package twice. With CMAF cbcs, encrypt once, package once, both manifests deliver to all DRMs.

The trade-off: cbcs mode is slightly less efficient than ctr mode for some content, and the cross-DRM key delivery still requires per-platform license flows. The operational simplification pays for both.

Chunked CMAF

The standard CMAF segment is delivered as a complete file. A 6-second segment isn't available to the player until the encoder finishes producing all 6 seconds. This is the latency floor for live CMAF: roughly 1-2× the segment duration before any byte hits the player.

Chunked CMAF splits the segment into smaller chunks (typically 200-500ms each). The encoder produces chunks as encoding progresses; the server delivers chunks via HTTP chunked transfer encoding; the player consumes chunks as they arrive. The full segment isn't done yet when the first chunks are already playing.

This is the mechanism behind both LL-HLS (with partial segments) and LL-DASH. Same chunked-CMAF segments work for both.

The benefit: live latency drops from 15-45 seconds (standard CMAF) to 2-3 seconds (chunked CMAF + LL-HLS / LL-DASH). The cost is operational complexity: chunked encoding requires real-time chunk emission, CDNs need to support chunked transfer with proper cache semantics, and the failure modes are subtler (a slow chunk delays the whole segment).

CMAF for live vs VOD

For VOD, CMAF segments are typically 4-10 seconds long. Tradeoffs:

Shorter segments (2-4s) — faster ABR adaptation, smaller storage granularity, better for chunked-CMAF live, more HTTP overhead.
Longer segments (8-10s) — fewer files, better compression efficiency at the segment level, slower ABR adaptation.

Premium streaming typically uses 4-6 second segments as the balance. Live with chunked CMAF often uses 2-4 second segments to enable lower latency targets.

For live, segment duration interacts with chunk duration:

4-second segments + 250ms chunks = 16 chunks per segment, ~250ms latency floor.
2-second segments + 200ms chunks = 10 chunks per segment, ~200ms latency floor.

The latency floor here assumes ideal CDN behavior; real-world latency adds CDN cache validation, network propagation, and player buffer. 2-3s end-to-end is achievable; sub-1s is in WebRTC territory, not CMAF territory.

Box-level details (for the operationally curious)

A few fMP4 box-level details that matter operationally:

ftyp major brand — cmfc for CMAF chunks, iso6 for general fMP4. Some legacy players are picky about brand strings.
tfdt (track fragment decode time) — gives the absolute timestamp of each fragment. CMAF requires its presence; players use it for time alignment.
sidx (segment index) — optional but useful for byte-range requests in DASH. For HLS-only delivery, can be omitted.
emsg (event message) — CMAF supports inline metadata via emsg boxes. Used for SCTE-35 ad markers, custom ID3 events, in-band cue points.

Pipelines that produce CMAF need to get these right; getting them wrong is a class of "the manifest passes validation but the player breaks" bugs that's hard to debug.

CMAF vs MPEG-TS

For HLS specifically, CMAF replaces MPEG-2 Transport Stream segments. Why use CMAF over TS:

Better encryption — CENC support in CMAF; TS encryption is per-stream AES-128 with weaker key delivery options.
Smaller segments — TS has framing overhead per packet; CMAF is more compact.
DASH compatibility — same segments serve both HLS and DASH. TS can't.
Modern codec support — HEVC and AV1 in TS exists but is finicky; CMAF handles them cleanly.

The case for staying on TS: legacy clients that don't support fMP4 HLS. iOS 10 and earlier, smart TVs from 2015 and earlier, some embedded set-top boxes. Practical 2026 install base: <2% of consumer streaming traffic. Most pipelines have dropped TS.

CMAF compatibility caveats

A few places where CMAF compatibility breaks down in practice:

Older Apple devices — iOS 10 and earlier don't support CMAF in HLS. iOS 11+ supports CMAF cleanly. Streaming services targeting the long tail still ship a TS HLS variant for these clients, even when their primary delivery is CMAF.
Encryption mode mixing — pipelines that started pre-cbcs-CMAF sometimes have legacy ctr-encrypted segments alongside cbcs-encrypted segments. Players are intolerant of mode-switching mid-stream; you have to encrypt consistently per-track or migrate the archive in a single pass.
CMAF chunks vs CMAF fragments — the spec uses both terms inconsistently. "Chunks" usually means HTTP-level chunks for chunked-CMAF; "fragments" means moof-mdat pairs in a segment. Don't conflate them — chunked-CMAF produces multiple fragments per segment delivered as multiple HTTP chunks.
Init segment compatibility across encoder versions — if you re-encode an existing variant with a different encoder version, the new init segment may have subtly different codec parameters. Players cache init segments per URL; if your URLs don't bust the cache, players will mix new media segments with old init segments and break in subtle ways. Bust the cache on encoder upgrades.
DRM key rotation — CENC supports per-segment key rotation, but most players cache keys aggressively. Rotate carefully or you'll see "first segment of new key fails to decrypt" failures.

These are the operationally annoying parts of CMAF. None of them are deal-breakers; all of them have caused production incidents at major streaming services that didn't anticipate them.

What MpegFlow does with CMAF

MpegFlow's DAG runtime expresses CMAF packaging as a discrete stage downstream of the parallel rendition encodes. The partitioner persists each rendition stage and the packaging stage to job_stages with dependency tracking; the packaging stage runs on an FfmpegExecutor worker, consumes the upstream rendition outputs via cross-stage data flow, and emits CMAF init segments + media segments along with the HLS and DASH manifests that reference them.

Today's CMAF emission is FfmpegExecutor-driven. Shaka Packager and the multi-tool Docker image that would back it are on the Phase 2D roadmap, not currently shipped. That means CENC encryption, cbcs vs cenc selection, multi-DRM signaling, and certain advanced LL-CMAF behaviors are operator-side work today (handled in customer packaging tooling alongside MpegFlow) rather than pipeline-native operations. Roadmap, not present.

For chunked CMAF / low-latency live, what the pipeline emits today is constrained by the FFmpeg HLS/DASH muxer's chunking support; production-grade LL-CMAF for tight targets is part of the same Phase 2D conversation about dedicated packagers.

The strict-broker security model handles packaging like any pipeline payload — workers carry zero ambient credentials, content access flows through short-lived presigned URLs scoped per stage, and access is disposed on completion. Sibling cancellation propagates fatal upstream failures so dependent packaging doesn't run on broken renditions; rendition-level partial-success reporting surfaces granular state when a subset fails.

CMAF correctness is one of the parts of streaming infrastructure that fails subtly. We exercise the DASH-IF conformance test corpus and Apple's media-stream-validator against pipeline output during regression validation; the failure modes here are silent and customer-impacting, which is why the packaging path stays under heavy regression discipline even before Phase 2D ships dedicated-packager support.

If you're moving from a TS-based HLS pipeline to CMAF, or going multi-protocol for the first time, that's a conversation we have regularly during onboarding. The unlock is meaningful — typically 30-50% storage cost reduction (no more two-format duplicate encoding) plus the operational simplification of single-pipeline output flowing into multi-DRM packaging downstream.

This page is the engineering reference: what CMAF is, how it differs from earlier fMP4 implementations, how Common Encryption rides on top of it, and how chunked CMAF enables low-latency.

What CMAF is

Structurally, a CMAF segment is an fMP4 file with:

Initialization segment — init.mp4 containing ftyp and moov boxes. Codec parameters, dimensions, timescales. Loaded once per variant.
Media segments — sequence of .m4s files containing moof (movie fragment) + mdat (media data) box pairs. The actual encoded media.

A typical layout for a single variant:

1080p/
  init.mp4
  seg-00001.m4s
  seg-00002.m4s
  seg-00003.m4s
  ...

Each .m4s is a self-contained fragment with its own moof + mdat. Players concatenate the init segment with media segments to play.

What's "common" — the spec discipline

CMAF's value isn't that it's a new container. fMP4 already existed. CMAF's value is the constraints the spec imposes:

Single track per segment — segments contain exactly one media track (one video, one audio, or one subtitle). HLS expects this; DASH allows multiplexed segments. CMAF enforces single-track for cross-protocol compatibility.
Defined timing structure — the timescale, base media decode time, and fragment durations are constrained to ensure both HLS and DASH players parse them consistently.
Defined boxes — the spec specifies exactly which fMP4 boxes are required, optional, or prohibited. No vendor extensions that would break cross-player compatibility.
Defined encryption format — Common Encryption (CENC) profile defines the exact AES mode, IV format, and key identifier scheme used in CMAF segments.

The constraints sound minor; they're the difference between "fMP4 segments" (theoretically cross-protocol) and "CMAF segments" (actually cross-protocol).

Common Encryption (CENC)

CENC, formally ISO/IEC 23001-7, is the encryption scheme CMAF uses. It defines two modes:

AES-CTR — counter mode AES. Used by Widevine and PlayReady.
AES-CBC — cipher block chaining AES, defined as cbcs mode in CENC. Used by FairPlay.

Chunked CMAF

This is the mechanism behind both LL-HLS (with partial segments) and LL-DASH. Same chunked-CMAF segments work for both.

CMAF for live vs VOD

For VOD, CMAF segments are typically 4-10 seconds long. Tradeoffs:

Shorter segments (2-4s) — faster ABR adaptation, smaller storage granularity, better for chunked-CMAF live, more HTTP overhead.
Longer segments (8-10s) — fewer files, better compression efficiency at the segment level, slower ABR adaptation.

Premium streaming typically uses 4-6 second segments as the balance. Live with chunked CMAF often uses 2-4 second segments to enable lower latency targets.

For live, segment duration interacts with chunk duration:

4-second segments + 250ms chunks = 16 chunks per segment, ~250ms latency floor.
2-second segments + 200ms chunks = 10 chunks per segment, ~200ms latency floor.

Box-level details (for the operationally curious)

A few fMP4 box-level details that matter operationally:

ftyp major brand — cmfc for CMAF chunks, iso6 for general fMP4. Some legacy players are picky about brand strings.
tfdt (track fragment decode time) — gives the absolute timestamp of each fragment. CMAF requires its presence; players use it for time alignment.
sidx (segment index) — optional but useful for byte-range requests in DASH. For HLS-only delivery, can be omitted.
emsg (event message) — CMAF supports inline metadata via emsg boxes. Used for SCTE-35 ad markers, custom ID3 events, in-band cue points.

Pipelines that produce CMAF need to get these right; getting them wrong is a class of "the manifest passes validation but the player breaks" bugs that's hard to debug.

CMAF vs MPEG-TS

For HLS specifically, CMAF replaces MPEG-2 Transport Stream segments. Why use CMAF over TS:

Better encryption — CENC support in CMAF; TS encryption is per-stream AES-128 with weaker key delivery options.
Smaller segments — TS has framing overhead per packet; CMAF is more compact.
DASH compatibility — same segments serve both HLS and DASH. TS can't.
Modern codec support — HEVC and AV1 in TS exists but is finicky; CMAF handles them cleanly.

CMAF compatibility caveats

A few places where CMAF compatibility breaks down in practice:

Older Apple devices — iOS 10 and earlier don't support CMAF in HLS. iOS 11+ supports CMAF cleanly. Streaming services targeting the long tail still ship a TS HLS variant for these clients, even when their primary delivery is CMAF.
Encryption mode mixing — pipelines that started pre-cbcs-CMAF sometimes have legacy ctr-encrypted segments alongside cbcs-encrypted segments. Players are intolerant of mode-switching mid-stream; you have to encrypt consistently per-track or migrate the archive in a single pass.
CMAF chunks vs CMAF fragments — the spec uses both terms inconsistently. "Chunks" usually means HTTP-level chunks for chunked-CMAF; "fragments" means moof-mdat pairs in a segment. Don't conflate them — chunked-CMAF produces multiple fragments per segment delivered as multiple HTTP chunks.
Init segment compatibility across encoder versions — if you re-encode an existing variant with a different encoder version, the new init segment may have subtly different codec parameters. Players cache init segments per URL; if your URLs don't bust the cache, players will mix new media segments with old init segments and break in subtle ways. Bust the cache on encoder upgrades.
DRM key rotation — CENC supports per-segment key rotation, but most players cache keys aggressively. Rotate carefully or you'll see "first segment of new key fails to decrypt" failures.

These are the operationally annoying parts of CMAF. None of them are deal-breakers; all of them have caused production incidents at major streaming services that didn't anticipate them.

CMAF — the segment format that ended the HLS-vs-DASH duplicate-encoding problem

What CMAF is

What's "common" — the spec discipline

Common Encryption (CENC)

Chunked CMAF

CMAF for live vs VOD

Box-level details (for the operationally curious)

CMAF vs MPEG-TS

CMAF compatibility caveats

What MpegFlow does with CMAF

Related topics and reading

CMAF — the segment format that ended the HLS-vs-DASH duplicate-encoding problem

What CMAF is

What's "common" — the spec discipline

Common Encryption (CENC)

Chunked CMAF

CMAF for live vs VOD

Box-level details (for the operationally curious)

CMAF vs MPEG-TS

CMAF compatibility caveats

What MpegFlow does with CMAF

Related topics and reading

CMAF — the segment format that ended the HLS-vs-DASH duplicate-encoding problem

#What CMAF is

#What's "common" — the spec discipline

#Common Encryption (CENC)

#Chunked CMAF

#CMAF for live vs VOD

#Box-level details (for the operationally curious)

#CMAF vs MPEG-TS

#CMAF compatibility caveats

#What MpegFlow does with CMAF

Related topics and reading

CMAF — the segment format that ended the HLS-vs-DASH duplicate-encoding problem

#What CMAF is

#What's "common" — the spec discipline

#Common Encryption (CENC)

#Chunked CMAF

#CMAF for live vs VOD

#Box-level details (for the operationally curious)

#CMAF vs MPEG-TS

#CMAF compatibility caveats

#What MpegFlow does with CMAF

Related topics and reading

What CMAF is

What's "common" — the spec discipline

Common Encryption (CENC)

Chunked CMAF

CMAF for live vs VOD

Box-level details (for the operationally curious)

CMAF vs MPEG-TS

CMAF compatibility caveats

What MpegFlow does with CMAF

What CMAF is

What's "common" — the spec discipline

Common Encryption (CENC)

Chunked CMAF

CMAF for live vs VOD

Box-level details (for the operationally curious)

CMAF vs MPEG-TS

CMAF compatibility caveats

What MpegFlow does with CMAF