CEA-608/708 captions arrive embedded in broadcast contribution feeds and content masters. Streaming pipelines need WebVTT (or IMSC) for browser delivery. The conversion bridge — extract from broadcast format, parse, time-align, output streaming format — is essential for any pipeline that ingests broadcast content. ccextractor is the open-source standard for this conversion. This page is the engineering reference.
What CEA-608/708 captions are
CEA-608 and CEA-708 are the closed caption standards for North American broadcast TV. CEA-608 dates from analog NTSC; CEA-708 was developed for digital ATSC broadcasts. Both are typically carried in modern delivery as:
- In MPEG-TS: user data in PES packets or separate elementary streams.
- In H.264 / HEVC video: SEI (Supplemental Enhancement Information) messages — specifically
user_data_registered_itu_t_t35SEI with appropriate identifier codes. - In MP4/CMAF (legacy): less common; some workflows preserve SEI metadata.
For pipelines ingesting broadcast content, the captions ride along with the video bitstream. Extracting them requires SEI message parsing.
What ccextractor is
ccextractor is the open-source standard for extracting CEA-608/708 captions and converting to other formats. Maintained on GitHub; broad codec/container support; outputs WebVTT, SRT, TTML, and other caption formats.
Installation:
# Linux
apt install ccextractor
# macOS
brew install ccextractor
# From source
git clone https://github.com/CCExtractor/ccextractor.git
cd ccextractor && ./build.sh
Verify installation:
ccextractor --version
Basic usage
Extract captions from a video file:
ccextractor input.ts -o output.vtt
ccextractor auto-detects the input format, finds CEA-608/708 in SEI messages or user data, and outputs WebVTT.
For MP4 with H.264 SEI captions:
ccextractor input.mp4 -o output.vtt
For specific service selection (CEA-708 supports multiple services per stream):
ccextractor input.ts --service 1 -o english.vtt
ccextractor input.ts --service 2 -o spanish.vtt
Service 1 is typically primary language; service 2 is often Spanish in US broadcasts.
Output format options
ccextractor supports multiple output formats:
ccextractor input.ts -o output.vtt # WebVTT
ccextractor input.ts -o output.srt # SubRip
ccextractor input.ts --out=ttml -o out.ttml # TTML
ccextractor input.ts --out=text # Plain text
ccextractor input.ts --out=spupng # Bitmap subtitles
ccextractor input.ts --out=raw # Raw 608/708 data
For streaming pipelines, WebVTT (HLS) and TTML/IMSC (DASH) are the common outputs. ccextractor produces both reliably.
Roll-up vs pop-on captions
CEA-608/708 supports two display modes:
Pop-on captions: caption appears all at once; replaces previous. Common for VOD and most pre-produced content.
Roll-up captions: lines scroll up as new lines arrive. Common for live broadcasts (news, sports).
For WebVTT conversion, roll-up requires special handling:
- ccextractor by default merges roll-up rows into a single multi-line cue.
- Some pipelines prefer separate cues per line.
- Behavior is configurable via
--roll-upoptions.
The choice depends on player rendering preferences. Test on actual target players to verify roll-up handling looks acceptable.
Multi-service handling
CEA-708 streams can have multiple services:
- Service 1 (primary, usually English).
- Service 2 (often Spanish).
- Services 3-63 (rare; specialty languages or alternate caption tracks).
ccextractor handles each service independently:
ccextractor input.ts --service 1 -o english.vtt
ccextractor input.ts --service 2 -o spanish.vtt
ccextractor input.ts --service 3 -o french.vtt
For pipelines producing multi-language streaming, run ccextractor once per service to produce per-language WebVTT files.
The HLS manifest then references each language as a separate subtitle track:
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",LANGUAGE="en",URI="subs/en/index.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="Spanish",LANGUAGE="es",URI="subs/es/index.m3u8"
Timing alignment
The output WebVTT timestamps need to align with the streaming delivery's media timeline. Several considerations:
MPEG-TS source timestamps:
CEA-608/708 captions in broadcast content reference the MPEG-TS PTS values. ccextractor outputs WebVTT with timestamps based on PTS.
For HLS streaming where MPEG-TS PTS is preserved, this works directly. For pipelines re-multiplexing or re-encoding, timestamps may shift; verify alignment after conversion.
Time offset adjustment:
If the streaming version is offset from broadcast (e.g., starting 5 seconds later), timestamps need adjustment:
ccextractor input.ts --delay -5000 -o output.vtt
--delay shifts timestamps by milliseconds (negative shifts earlier; positive shifts later).
X-TIMESTAMP-MAP for HLS:
For HLS subtitle segments, X-TIMESTAMP-MAP aligns WebVTT to MPEG-TS timeline. ccextractor may or may not generate this depending on output mode. For HLS pipelines, additional X-TIMESTAMP-MAP insertion may be needed (see HLS X-TIMESTAMP-MAP).
Pipeline integration
A typical broadcast-to-streaming pipeline with caption conversion:
- Ingest broadcast contribution feed (SRT/RIST → MPEG-TS).
- Demux TS to extract video and audio streams.
- Extract captions via ccextractor:
ccextractor demuxed_video.ts -o captions.vtt - Re-encode video for streaming.
- Package with WebVTT subtitles into HLS/DASH.
The caption extraction is parallel to (or sequential with) video processing. The output WebVTT integrates into manifest signaling for streaming delivery.
ffmpeg subtitle handling
ffmpeg has CEA-608/708 support too, though less mature than ccextractor:
ffmpeg -i input.ts -map 0:s:0 -c:s webvtt output.vtt
For some content this works directly. For complex broadcast captions (multiple services, roll-up timing, edge cases), ccextractor produces more reliable output.
For pipelines, ccextractor for caption extraction; ffmpeg for everything else. They complement each other well.
Validation
After conversion, verify:
Caption count and timing:
grep -c "^[0-9]" output.vtt
Counts caption lines. Compare to expected count from source.
Sample inspection:
Open output.vtt; verify cue contents look reasonable; verify timestamps span the expected range.
Player playback:
Play the streaming version with subtitles; verify captions appear at correct times relative to video and dialogue. Test on actual target players.
Cross-language consistency:
For multi-language extraction, verify all languages cover the same time range and produce comparable output volumes (English service should have similar caption count to Spanish service if content is the same).
Common conversion bugs
Bug 1: Captions extracted but timing wrong.
Source MPEG-TS PCR drift or discontinuity wasn't accounted for. Output captions are off by seconds.
Solution: verify source PCR continuity; re-extract with --no-bom or other timing-related flags as needed.
Bug 2: Roll-up captions look weird in WebVTT.
ccextractor's default roll-up handling may not match player expectations.
Solution: try different --roll-up options; test on target players; pick the option that works.
Bug 3: Service number mismatch.
ccextractor extracts service N but content uses different numbering convention.
Solution: extract all services (no --service flag) to see what's available; identify by content.
Bug 4: Special characters mangled.
CEA-608/708 has limited character set; international characters may not encode correctly.
Solution: ccextractor handles most standard cases; for unusual content, verify output and clean up if needed.
Bug 5: Timing windows not aligned with HLS segments.
Captions extracted relative to source PTS; HLS segments use different timeline.
Solution: post-process WebVTT to add X-TIMESTAMP-MAP per HLS segment (see HLS X-TIMESTAMP-MAP).
Operational considerations
Things that matter for caption conversion in production:
- Source reliability — broadcast captions sometimes have errors or auto-generated quality issues. Conversion preserves whatever's in the source.
- Multi-service pipeline — for multi-language broadcasts, automate per-service extraction.
- Timing verification — verify alignment programmatically; manual spot-checks are insufficient at scale.
- Player testing — test on actual target players; CEA-608/708 to WebVTT conversions have edge cases.
- Pipeline error handling — gracefully handle source content without captions; don't fail the whole pipeline because captions are missing.
ccextractor advanced options
For complex broadcast content, ccextractor supports many options:
Output styling:
ccextractor input.ts --webvtt-styling -o styled.vtt
Adds CSS styling derived from CEA-708 attributes (color, position, etc.).
Time codes:
ccextractor input.ts --tcodes -o output.vtt
Outputs timecodes in addition to text. Useful for editorial workflows.
No-timestamps (text-only):
ccextractor input.ts --no-timestamps -o transcript.txt
Useful for transcription workflows that don't need timing.
Specific input format:
ccextractor --input mp4 input.mp4
ccextractor --input ts input.ts
ccextractor --input bin input.bin
For when auto-detection fails or you want to be explicit.
Pipeline error handling
Things that can go wrong during caption extraction:
Content has no captions: ccextractor returns empty output. Handle gracefully — log a warning; continue pipeline; don't fail.
Content has captions but ccextractor can't find them: format issue; investigate; may need to re-encode source with explicit caption preservation.
Captions exist but are corrupted: extracted captions may be partial or malformed. Decide policy: best-effort acceptable, or require complete captions?
Multiple caption sources (e.g., embedded SEI + sidecar SCC files): pipeline must decide which to use. Embedded SEI is most common; sidecars are sometimes more authoritative.
For pipeline robustness, handle each error mode explicitly. Default behavior should be "produce output without captions if extraction fails" rather than "fail the whole pipeline."
What MpegFlow does with caption conversion
MpegFlow's DAG runtime expresses caption extraction as a discrete stage within the broader workflow. The partitioner places the extraction stage on a CcextractorExecutor worker (one of three first-party StageExecutor implementations alongside FfmpegExecutor and FfprobeExecutor); the proto executor field on the stage tells the ExecutorRegistry which binary to dispatch. Per-workflow configuration selects which CEA-708 services to extract; per-stage retry handles transient failures.
For pipelines producing both HLS and DASH delivery, the caption-conversion downstream stage emits WebVTT (HLS) and IMSC/TTML (DASH) from the same extracted source. Cross-stage data flow wires the ccextractor stage output into the packaging stage's caption inputs; sibling cancellation propagates across rendition stages so a fatal extraction failure doesn't waste compute on dependent encodes.
Caption timing is preserved end-to-end via timestamp-preservation discipline at each stage boundary. X-TIMESTAMP-MAP control is not currently a customer-facing knob — the value is whatever the underlying tool emits for the source's PTS. Operators who need a specific X-TIMESTAMP-MAP behavior for a player target run a post-conversion adjustment in their own tooling; native pipeline-level X-TIMESTAMP-MAP control is on the backlog.
The strict-broker security model handles caption conversion the same as any pipeline stage — workers carry zero ambient credentials and receive content access via short-lived presigned URLs scoped per stage, run extraction, emit caption files, and dispose of access on completion.
For customers ingesting broadcast content for streaming distribution, caption conversion is a standard early-pipeline stage. We help customers verify caption quality during onboarding; sometimes broadcast captions need supplemental editorial cleanup before streaming.
The general guidance: ccextractor is the open-source standard for CEA-608/708 to streaming-format conversion. Use it; don't reinvent. The mechanical conversion is solved; the editorial decisions (which services, how to handle roll-up, what timing offsets) are pipeline-specific. Get the pipeline integration right; the conversion itself is reliable.