Live ingest and low-latency packaging architecture

MpegFlow

Reference architecture for production live video — SRT/RTMP/WebRTC contribution, live encoder pool, low-latency HLS packaging, origin caching, and CDN handoff. Latency math, capacity sizing, failure modes, and the design we're shipping for live in 2026 Q3.

Live video is a different workload from VOD. The latency budget is fixed (typically 4-8 seconds glass-to-glass for low-latency, 15-30 seconds for standard); the encoder cannot hit pause; the contribution feed never reaches a clean stop. Every architectural decision for live trades against latency or against reliability — and the wrong tradeoff at any layer compounds at the player.

This document is the reference architecture for production-grade live video on MpegFlow. Live ships in 2026 Q3 — this is the design we're building toward, with the components, capacity sizing, and failure modes documented honestly.

Use case in scope

You are running:

Live broadcast or OTT events — sports, news, conferences, gaming, premium event programming
Multi-bitrate ABR fanout — the same source encoded into 4-6 renditions for client adaptive playback
End-to-end latency targets of 4-15 seconds depending on the use case (interactive ≤ 4s, broadcast ≤ 8s, on-demand-replay ≤ 15s)
Contribution from production gear — broadcast cameras, OBS, encoder appliances (Elemental, Haivision, Osprey), or WebRTC contribution from clients

You also have or are willing to set up:

A managed Kubernetes cluster with GPU node groups for live encoder pools (H.264 / HEVC live encoding without GPU is technically possible but not economical at scale)
An origin / packager caching layer (we self-host nginx + Varnish or use managed Cloudflare R2 + Workers)
A CDN with low-latency-streaming support (Cloudflare LL-HLS, Akamai Media Services Live, Fastly's streaming product)

Architecture overview

flowchart LR
    subgraph "Contribution"
      C1[Broadcaster<br/>SRT 1080p60]
      C2[OBS streamer<br/>RTMP 720p30]
      C3[WebRTC client<br/>VP8 480p30]
    end

    subgraph "Ingest layer"
      ING[Ingest gateway<br/>SRT-listener<br/>RTMP-listener<br/>WHIP/WHEP]
    end

    subgraph "Live encoder pool (K8s + KEDA)"
      LE1[Live encoder<br/>NVENC GPU<br/>1080p60 master]
      LE2[Live transcoder<br/>NVENC GPU<br/>4-rendition ABR]
    end

    subgraph "Packaging"
      PKG[LL-HLS packager<br/>CMAF + chunked transfer<br/>2-second segments<br/>500ms chunks]
    end

    subgraph "Origin"
      ORG[Origin cache<br/>nginx + Varnish<br/>30-second window]
    end

    subgraph "Delivery"
      CDN[CDN<br/>LL-HLS edge]
      P[Player<br/>hls.js / Shaka]
    end

    C1 -->|SRT 1.4MB/s| ING
    C2 -->|RTMP 800KB/s| ING
    C3 -->|WHIP 300KB/s| ING
    ING -->|raw frames| LE1
    LE1 -->|H.264 master| LE2
    LE2 -->|4 renditions| PKG
    PKG -->|HLS manifest + segments| ORG
    ORG -->|HTTP| CDN
    CDN -->|LL-HLS| P

    classDef live fill:#1a1a1c,stroke:#ff6b35,stroke-width:1.5px,color:#f5f5f5
    classDef control fill:#0a0a0c,stroke:#71717a,stroke-width:1.2px,color:#a1a1aa
    class C1,C2,C3,ING live
    class LE1,LE2,PKG,ORG,CDN,P control

The shape: contribution flows in via three protocols (SRT for broadcast-grade, RTMP for legacy, WHIP/WebRTC for browser-based contribution), an ingest gateway normalizes them into raw frames, the live encoder pool produces a master + ABR ladder, the LL-HLS packager emits CMAF chunked segments, the origin holds a rolling 30-second window, and the CDN edge serves players.

Latency math

End-to-end glass-to-glass latency is a sum, not a product. Each layer adds:

Layer	Typical latency contribution	Optimization headroom
Camera + production switcher	100-300ms	Hardware-dependent
Contribution encoder (camera → SRT/RTMP)	200-500ms	`tune zerolatency`, `bf=0`, `refs=1`
SRT/RTMP transit	200-500ms (depends on geo)	Keep contribution geo-close to ingest
Ingest gateway processing	50-150ms	Mostly fixed
Live encoder (raw → H.264 ABR)	300-800ms	NVENC reduces vs CPU; preset matters
LL-HLS packager (CMAF chunks)	500-1000ms (2 chunks of 500ms)	Smaller chunks = less latency, more overhead
Origin → CDN propagation	200-400ms	LL-HLS push helps; HTTP/2 server push is critical
CDN edge → player	100-300ms	Geo + connection quality
Player buffering (jitter buffer)	1-2 seconds	Aggressive buffer = lower latency, more rebuffer

Sum at the bottom of the table: ~3-6 seconds for low-latency configs, ~6-10 seconds for safer configs. Below 3 seconds end-to-end requires WebRTC delivery (CMAF chunked over HLS bottoms out around 3 seconds because of CDN propagation alone).

The architectural decision: pick a latency target up-front and budget every layer against it. The teams that don't fix the budget end up with 12-second latency by accident and spend a quarter trying to figure out why.

Component walkthrough

Ingest gateway

The ingest gateway terminates contribution feeds and normalizes them into raw frames the encoder pool can consume.

SRT (Secure Reliable Transport) is the broadcast-grade contribution protocol. It runs over UDP with retransmission, encryption, and NAT traversal. Latency overhead is configurable (typically 120-300ms). For broadcast contribution, SRT is the standard.

RTMP (Real-Time Messaging Protocol) is the legacy protocol that most contribution gear (OBS, vMix, older encoder appliances) still defaults to. It runs over TCP, which means it's lossless but TCP retransmission stalls hurt latency. Plan for it; it's not going away.

WHIP/WHEP (WebRTC HTTP Ingestion/Egress Protocol) is the modern browser-contribution path. WebRTC's media stack handles encoding and transit; the ingest gateway accepts the WebRTC offer and bridges to the encoder pool. Sub-500ms contribution-to-ingest latency. The compromise: WebRTC clients renegotiate during connection events, so the ingest layer needs reconnect handling.

The gateway runs as a Deployment with HPA on incoming-stream count. Each pod handles roughly 50-100 concurrent ingests depending on protocol mix. Multi-protocol ingest pods are simpler operationally than per-protocol pods, but per-protocol pods scale more cleanly.

Live encoder pool

The encoder pool runs FFmpeg or vendored encoder binaries with NVENC for H.264/HEVC live encoding. CPU-only live encoding is feasible for low-rendition single-stream cases (think conference recording at 720p single-bitrate); for production multi-bitrate ABR fanout at 1080p60 or 4K, GPU encoding is the only economical path.

A live encoder pod consumes one ingest stream and emits a master H.264 (or HEVC) feed plus the ABR ladder renditions. NVENC on NVIDIA T4 handles 4-6 renditions at 1080p60 simultaneously per GPU; A10 handles the same workload at 4K.

The K8s primitive: Deployment of encoder pods with KEDA scaling on ingest-pool queue depth. Each pod is sized for one stream (1 ingest + N renditions). Scale to zero is feasible during off-hours; pre-warming pods 5 minutes before scheduled events is standard practice.

The full K8s + KEDA topology covers the operator-pattern coordination that makes this work.

LL-HLS packager

The packager consumes the encoder's ABR ladder and produces CMAF chunked segments suitable for LL-HLS delivery. Two configuration choices dominate latency:

Segment length (typical: 2-4 seconds). Shorter segments reduce latency (player can start playback sooner) but increase manifest churn. 2 seconds is the practical low end for LL-HLS; below that, manifest update rates become the bottleneck.

Chunk length (typical: 500ms-1000ms). LL-HLS uses HTTP chunked transfer encoding to send segment chunks before the segment is complete. Shorter chunks = lower latency but more HTTP overhead. 500ms is a reasonable default for low-latency configs.

The packager emits to the origin via HTTP push (CMAF EXT-X-PART directives in the manifest, with EXT-X-PRELOAD-HINT to signal upcoming chunks). The origin must support HTTP/2 server push (or HTTP/3 stream prioritization) for the latency math to work — without push, players poll on every chunk and add 200-400ms of polling latency.

Origin cache

The origin maintains a rolling window of segments (typically 30 seconds) for player join-ahead and rewind. We run nginx with the cache and slice modules tuned for short TTLs, with Varnish in front for object-level caching.

The origin is also where contractual delivery hooks live: SCTE-35 marker injection for ad insertion, manifest manipulation for blackout policies, and DAI (dynamic ad insertion) handoff. These layer onto the origin via a manifest manipulator stage; we keep them out of the critical path so a manipulator failure doesn't kill the live stream.

CDN handoff

LL-HLS at the CDN edge requires push-aware caching. Cloudflare's Stream product handles this well; Akamai Media Services Live and Fastly's streaming products also support LL-HLS push. The configuration that matters: chunk-level cache keys (so each LL-HLS part is cached independently) and short TTLs (1-2 seconds for chunks, 30 seconds for completed segments).

Geographic distribution of edge nodes determines the floor on player-side latency. For sub-5-second end-to-end latency you need a CDN edge within ~200km of the player. This is why CDN choice matters for live more than for VOD — VOD tolerates a 500ms cold-cache fetch; live cannot.

Capacity sizing

For a single 1080p60 live stream with 4 ABR renditions (1080p, 720p, 480p, 360p):

1 GPU (T4 or A10) for the live encoder pool
1 packager pod (CPU only, ~2 vCPU)
~50 Mbps egress per concurrent viewer (sum of all renditions)

For a 4K HDR live stream with 5 ABR renditions:

1 A10 GPU for the live encoder
1 packager pod (CPU only, ~4 vCPU)
~100 Mbps egress per concurrent viewer

The dominant cost at audience scale is CDN egress, not encode. A 100K-viewer live event at 5 Mbps average per viewer = 500 Gbps peak CDN throughput. CDN pricing for live egress is typically $0.04-0.10/GB at that scale; the encode pool cost is rounding error against the CDN bill.

For multi-event scaling (e.g., 100 simultaneous live streams), the encode pool dominates: 100 GPUs at $0.50-1.50/hour = $50-150/hour. KEDA scaling makes this efficient for events that don't run 24/7.

Failure modes and what they cost you

Failure	Frequency	Customer-visible impact
Contribution loss (SRT/RTMP disconnect)	A few times per event for any stream	1-3 seconds of player rebuffer; auto-recovery
Live encoder restart (mid-event)	Rare, hours to days	2-5 seconds of black frames; player resumes from latest segment
Packager pacing failure	During traffic spikes	Latency drift up by 1-3 seconds; recovers within 30 seconds
Origin cache invalidation lag	During config changes	Stale segments served briefly (1-2 segments); recovers automatically
CDN regional event	A few times per year per CDN	Multi-CDN failover takes 30-60 seconds; players may need to refresh
GPU node-group unhealth	Rare	Encoder pool re-schedules; new pod takes 30-60 seconds to ready; affected stream loses ~1 minute of content

The architecture is designed to degrade gracefully: contribution loss → encoder waits with last frame → packager pads with the held frame → players see a brief still image, not a hard error. No layer is designed to drop-and-fail; every layer is designed to hold-the-line until upstream recovers.

Multi-region considerations

For events with global audiences or contractual region-specific requirements, the architecture extends per-region:

Ingest gateway in the contribution-source region (low latency from broadcaster to ingest)
Encoder pool replicated per delivery region (avoids transcontinental egress from a single encoder)
Packager + origin per region (minimizes manifest staleness)
CDN handles cross-region distribution natively

The trade-off: per-region encoder pools multiply GPU costs. A typical compromise: one primary encoder pool in the contribution region, with regional packager + origin caches that pull from the primary. This keeps the encoder cost single-region while still delivering low-latency to global audiences.

The multi-region failover architecture covers the failover semantics for the VOD case; the live equivalent has the same shape with tighter timing constraints.

Companion architectures

Kubernetes + KEDA deployment — the cluster topology this builds on (live encoder pools are the same shape as VOD encoder pools, just always-on)
Strict-broker security — multi-tenant security model that applies to live ingest + encode pools the same way it applies to VOD
Multi-region failover — failover semantics for the live case
DRM packaging pipeline — how live + DRM combine when content protection is required

Scope and companion architectures

This pattern covers live ingest, encoding, packaging, and delivery for unprotected content. Adjacent concerns:

DRM-protected live — pair with the DRM packaging architecture. The live packager additionally handles SPEKE key rotation per segment.
Server-side ad insertion (SSAI) — manifest manipulation on the origin, beat per-event SCTE-35 markers from the contribution feed. Out of scope here; pair with established providers (Yospace, Brightcove SSAI).
Captions (live) — typically delivered as a separate WebVTT/CEA-608 stream from the contribution feed, transmuxed by the packager into the manifest. Out of scope here; standard pattern in our broadcast partners' deployments.
Recording for VOD replay — write the master encoder output to durable storage in parallel with packaging. The recording becomes the mezzanine for VOD pipelines after the live event ends.

Honest scope: where we are vs where we're going

Live encoding ships in MpegFlow's 2026 Q3 release. The architecture above is what we're building toward; today's beta is VOD-only. For teams running live infrastructure today, the practical advice is to pair with established live products (Wowza, AWS MediaLive, Cloudflare Stream Live) for the live path and migrate to MpegFlow's live when it ships — assuming the architecture above matches what your team needs.

The honest reason live ships later than VOD: VOD's failure modes are bounded (a job either succeeds or retries), live's are continuous (every second of the stream is a new opportunity for failure). Building VOD first lets us prove the operational layer (queues, retries, audit, multi-tenant security) before adding the latency budget that live demands.

If your team is evaluating live infrastructure now, the orchestration platform evaluation framework applies just as much to live vendors as to VOD vendors. Most of the seven questions get harder, not easier, in live.

Use case in scope

You are running:

Live broadcast or OTT events — sports, news, conferences, gaming, premium event programming
Multi-bitrate ABR fanout — the same source encoded into 4-6 renditions for client adaptive playback
End-to-end latency targets of 4-15 seconds depending on the use case (interactive ≤ 4s, broadcast ≤ 8s, on-demand-replay ≤ 15s)
Contribution from production gear — broadcast cameras, OBS, encoder appliances (Elemental, Haivision, Osprey), or WebRTC contribution from clients

You also have or are willing to set up:

A managed Kubernetes cluster with GPU node groups for live encoder pools (H.264 / HEVC live encoding without GPU is technically possible but not economical at scale)
An origin / packager caching layer (we self-host nginx + Varnish or use managed Cloudflare R2 + Workers)
A CDN with low-latency-streaming support (Cloudflare LL-HLS, Akamai Media Services Live, Fastly's streaming product)

Architecture overview

flowchart LR
    subgraph "Contribution"
      C1[Broadcaster<br/>SRT 1080p60]
      C2[OBS streamer<br/>RTMP 720p30]
      C3[WebRTC client<br/>VP8 480p30]
    end

    subgraph "Ingest layer"
      ING[Ingest gateway<br/>SRT-listener<br/>RTMP-listener<br/>WHIP/WHEP]
    end

    subgraph "Live encoder pool (K8s + KEDA)"
      LE1[Live encoder<br/>NVENC GPU<br/>1080p60 master]
      LE2[Live transcoder<br/>NVENC GPU<br/>4-rendition ABR]
    end

    subgraph "Packaging"
      PKG[LL-HLS packager<br/>CMAF + chunked transfer<br/>2-second segments<br/>500ms chunks]
    end

    subgraph "Origin"
      ORG[Origin cache<br/>nginx + Varnish<br/>30-second window]
    end

    subgraph "Delivery"
      CDN[CDN<br/>LL-HLS edge]
      P[Player<br/>hls.js / Shaka]
    end

    C1 -->|SRT 1.4MB/s| ING
    C2 -->|RTMP 800KB/s| ING
    C3 -->|WHIP 300KB/s| ING
    ING -->|raw frames| LE1
    LE1 -->|H.264 master| LE2
    LE2 -->|4 renditions| PKG
    PKG -->|HLS manifest + segments| ORG
    ORG -->|HTTP| CDN
    CDN -->|LL-HLS| P

    classDef live fill:#1a1a1c,stroke:#ff6b35,stroke-width:1.5px,color:#f5f5f5
    classDef control fill:#0a0a0c,stroke:#71717a,stroke-width:1.2px,color:#a1a1aa
    class C1,C2,C3,ING live
    class LE1,LE2,PKG,ORG,CDN,P control

Latency math

End-to-end glass-to-glass latency is a sum, not a product. Each layer adds:

Layer	Typical latency contribution	Optimization headroom
Camera + production switcher	100-300ms	Hardware-dependent
Contribution encoder (camera → SRT/RTMP)	200-500ms	`tune zerolatency`, `bf=0`, `refs=1`
SRT/RTMP transit	200-500ms (depends on geo)	Keep contribution geo-close to ingest
Ingest gateway processing	50-150ms	Mostly fixed
Live encoder (raw → H.264 ABR)	300-800ms	NVENC reduces vs CPU; preset matters
LL-HLS packager (CMAF chunks)	500-1000ms (2 chunks of 500ms)	Smaller chunks = less latency, more overhead
Origin → CDN propagation	200-400ms	LL-HLS push helps; HTTP/2 server push is critical
CDN edge → player	100-300ms	Geo + connection quality
Player buffering (jitter buffer)	1-2 seconds	Aggressive buffer = lower latency, more rebuffer

Component walkthrough

Ingest gateway

The ingest gateway terminates contribution feeds and normalizes them into raw frames the encoder pool can consume.

Live encoder pool

The full K8s + KEDA topology covers the operator-pattern coordination that makes this work.

LL-HLS packager

The packager consumes the encoder's ABR ladder and produces CMAF chunked segments suitable for LL-HLS delivery. Two configuration choices dominate latency:

Origin cache

CDN handoff

Capacity sizing

For a single 1080p60 live stream with 4 ABR renditions (1080p, 720p, 480p, 360p):

1 GPU (T4 or A10) for the live encoder pool
1 packager pod (CPU only, ~2 vCPU)
~50 Mbps egress per concurrent viewer (sum of all renditions)

For a 4K HDR live stream with 5 ABR renditions:

1 A10 GPU for the live encoder
1 packager pod (CPU only, ~4 vCPU)
~100 Mbps egress per concurrent viewer

For multi-event scaling (e.g., 100 simultaneous live streams), the encode pool dominates: 100 GPUs at $0.50-1.50/hour = $50-150/hour. KEDA scaling makes this efficient for events that don't run 24/7.

Failure modes and what they cost you

Failure	Frequency	Customer-visible impact
Contribution loss (SRT/RTMP disconnect)	A few times per event for any stream	1-3 seconds of player rebuffer; auto-recovery
Live encoder restart (mid-event)	Rare, hours to days	2-5 seconds of black frames; player resumes from latest segment
Packager pacing failure	During traffic spikes	Latency drift up by 1-3 seconds; recovers within 30 seconds
Origin cache invalidation lag	During config changes	Stale segments served briefly (1-2 segments); recovers automatically
CDN regional event	A few times per year per CDN	Multi-CDN failover takes 30-60 seconds; players may need to refresh
GPU node-group unhealth	Rare	Encoder pool re-schedules; new pod takes 30-60 seconds to ready; affected stream loses ~1 minute of content

Multi-region considerations

For events with global audiences or contractual region-specific requirements, the architecture extends per-region:

Ingest gateway in the contribution-source region (low latency from broadcaster to ingest)
Encoder pool replicated per delivery region (avoids transcontinental egress from a single encoder)
Packager + origin per region (minimizes manifest staleness)
CDN handles cross-region distribution natively

The multi-region failover architecture covers the failover semantics for the VOD case; the live equivalent has the same shape with tighter timing constraints.

Companion architectures

Kubernetes + KEDA deployment — the cluster topology this builds on (live encoder pools are the same shape as VOD encoder pools, just always-on)
Strict-broker security — multi-tenant security model that applies to live ingest + encode pools the same way it applies to VOD
Multi-region failover — failover semantics for the live case
DRM packaging pipeline — how live + DRM combine when content protection is required

Scope and companion architectures

This pattern covers live ingest, encoding, packaging, and delivery for unprotected content. Adjacent concerns:

DRM-protected live — pair with the DRM packaging architecture. The live packager additionally handles SPEKE key rotation per segment.
Server-side ad insertion (SSAI) — manifest manipulation on the origin, beat per-event SCTE-35 markers from the contribution feed. Out of scope here; pair with established providers (Yospace, Brightcove SSAI).
Captions (live) — typically delivered as a separate WebVTT/CEA-608 stream from the contribution feed, transmuxed by the packager into the manifest. Out of scope here; standard pattern in our broadcast partners' deployments.
Recording for VOD replay — write the master encoder output to durable storage in parallel with packaging. The recording becomes the mezzanine for VOD pipelines after the live event ends.

Live ingest and low-latency packaging architecture

Use case in scope

Architecture overview

Latency math

Component walkthrough

Ingest gateway

Live encoder pool

LL-HLS packager

Origin cache

CDN handoff

Capacity sizing

Failure modes and what they cost you

Multi-region considerations

Companion architectures

Scope and companion architectures

Honest scope: where we are vs where we're going

Related architectures and reading

Live ingest and low-latency packaging architecture

Use case in scope

Architecture overview

Latency math

Component walkthrough

Ingest gateway

Live encoder pool

LL-HLS packager

Origin cache

CDN handoff

Capacity sizing

Failure modes and what they cost you

Multi-region considerations

Companion architectures

Scope and companion architectures

Honest scope: where we are vs where we're going

Related architectures and reading

Live ingest and low-latency packaging architecture

#Use case in scope

#Architecture overview

#Latency math

#Component walkthrough

#Ingest gateway

#Live encoder pool

#LL-HLS packager

#Origin cache

#CDN handoff

#Capacity sizing

#Failure modes and what they cost you

#Multi-region considerations

#Companion architectures

#Scope and companion architectures

#Honest scope: where we are vs where we're going

Related architectures and reading

Live ingest and low-latency packaging architecture

#Use case in scope

#Architecture overview

#Latency math

#Component walkthrough

#Ingest gateway

#Live encoder pool

#LL-HLS packager

#Origin cache

#CDN handoff

#Capacity sizing

#Failure modes and what they cost you

#Multi-region considerations

#Companion architectures

#Scope and companion architectures

#Honest scope: where we are vs where we're going

Related architectures and reading

Use case in scope

Architecture overview

Latency math

Component walkthrough

Ingest gateway

Live encoder pool

LL-HLS packager

Origin cache

CDN handoff

Capacity sizing

Failure modes and what they cost you

Multi-region considerations

Companion architectures

Scope and companion architectures

Honest scope: where we are vs where we're going

Use case in scope

Architecture overview

Latency math

Component walkthrough

Ingest gateway

Live encoder pool

LL-HLS packager

Origin cache

CDN handoff

Capacity sizing

Failure modes and what they cost you

Multi-region considerations

Companion architectures

Scope and companion architectures

Honest scope: where we are vs where we're going