ffprobe stream inspection — extracting media info for pipeline automation

MpegFlow

Practical guide to ffprobe — common inspection patterns, JSON output for scripting, stream-specific inspection, frame and packet analysis, production usage in pipelines.

ffprobe is the inspection tool every video pipeline reaches for when it needs to know what's in a media file. Codec, bitrate, resolution, frame rate, color metadata, encryption status, duration — all of it accessible via ffprobe's CLI with structured JSON output. For pipeline automation, ffprobe is the bridge between unknown input files and informed encoding decisions. This page is the engineering reference.

What ffprobe is

ffprobe is a media inspection tool bundled with ffmpeg. Same libraries (libavformat, libavcodec); different focus. Where ffmpeg processes media, ffprobe describes it.

ffprobe outputs:

Format-level metadata (container type, duration, bitrate, etc.).
Per-stream metadata (codec, resolution, frame rate, etc.).
Per-frame data (pts, dts, type, size).
Per-packet data (size, position, timing).

For pipeline automation, ffprobe is the foundation — every "decide encoding parameters based on source" pipeline starts with ffprobe.

Basic invocations

Quick stream summary:

ffprobe input.mp4

Output (abbreviated):

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
  Duration: 00:01:30.00, start: 0.000000, bitrate: 5000 kb/s
  Stream #0:0(eng): Video: h264 (High), yuv420p, 1920x1080, 4500 kb/s, 25 fps
  Stream #0:1(eng): Audio: aac (LC), 48000 Hz, stereo, fltp, 192 kb/s

For scripting, structured output is better:

ffprobe -v error -show_format -show_streams -print_format json input.mp4

Output is JSON with format-level and per-stream sections. Easy to parse programmatically.

Common inspection patterns

Get codec name for video stream:

ffprobe -v error -select_streams v:0 -show_entries stream=codec_name -of default=noprint_wrappers=1:nokey=1 input.mp4

Output: h264

Get resolution:

ffprobe -v error -select_streams v:0 -show_entries stream=width,height -of csv=s=x:p=0 input.mp4

Output: 1920x1080

Get duration in seconds:

ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 input.mp4

Output: 90.000000

Get bitrate:

ffprobe -v error -show_entries stream=bit_rate -select_streams v:0 -of default=noprint_wrappers=1:nokey=1 input.mp4

Output: 4500000 (bits per second)

Check HDR signaling:

ffprobe -v error -select_streams v:0 -show_entries stream=color_primaries,color_transfer,color_space -of json input.mp4

Output:

{
  "streams": [{
    "color_primaries": "bt2020",
    "color_transfer": "smpte2084",
    "color_space": "bt2020nc"
  }]
}

This is HDR10 (PQ + BT.2020). For SDR: bt709/bt709/bt709.

Frame count:

ffprobe -v error -count_packets -select_streams v:0 -show_entries stream=nb_read_packets -of csv=p=0 input.mp4

Output: 2250 (packet count = roughly frame count for video).

JSON for scripting

For pipeline automation, JSON output is the standard:

ffprobe -v error -print_format json -show_format -show_streams input.mp4 > probe.json

Then parse with jq, Python, or any JSON tool:

import json
import subprocess

result = subprocess.run(
    ['ffprobe', '-v', 'error', '-print_format', 'json', '-show_format', '-show_streams', 'input.mp4'],
    capture_output=True, text=True
)
data = json.loads(result.stdout)

video_stream = next(s for s in data['streams'] if s['codec_type'] == 'video')
print(f"Codec: {video_stream['codec_name']}")
print(f"Resolution: {video_stream['width']}x{video_stream['height']}")
print(f"Frame rate: {video_stream['r_frame_rate']}")
print(f"Pixel format: {video_stream['pix_fmt']}")

For pipeline ingest stages, this kind of probing is the first thing that happens. Drives downstream encoding decisions.

Frame and packet inspection

For deeper analysis:

Per-frame info:

ffprobe -v error -select_streams v:0 -show_frames -of json input.mp4

Output (abbreviated):

{
  "frames": [
    {
      "media_type": "video",
      "key_frame": 1,
      "pkt_pts_time": "0.000000",
      "pict_type": "I",
      "coded_picture_number": 0,
      ...
    }
  ]
}

Useful for verifying GOP structure (where keyframes are), checking frame types, debugging timing.

Per-packet info:

ffprobe -v error -select_streams v:0 -show_packets -of json input.mp4

Outputs per-packet data: pts, dts, size, position. Useful for:

Verifying segment boundaries align with keyframes.
Debugging streaming pipelines.
Computing per-segment bitrate from packet sizes.

Probing live streams

ffprobe works on streams, not just files:

ffprobe -v error -i 'srt://host:port?streamid=stream1'
ffprobe -v error -i 'http://example.com/playlist.m3u8'
ffprobe -v error -i 'rtmp://example.com/live/stream'

For live streams, ffprobe captures a snapshot of stream characteristics. Useful for:

Verifying live stream is producing content.
Detecting source stream issues (wrong codec, wrong resolution).
Pipeline orchestration (decide downstream config based on live source).

Complex inspection scenarios

Detect HDR vs SDR:

def is_hdr(probe_data):
    video = next(s for s in probe_data['streams'] if s['codec_type'] == 'video')
    return video.get('color_transfer') in ['smpte2084', 'arib-std-b67']

Detect interlaced vs progressive:

def is_interlaced(probe_data):
    video = next(s for s in probe_data['streams'] if s['codec_type'] == 'video')
    return video.get('field_order') in ['tt', 'bb', 'tb', 'bt']

Detect encryption:

def is_encrypted(probe_data):
    for stream in probe_data['streams']:
        if 'encryption' in stream.get('disposition', {}):
            return True
    return False

These primitives compose into pipeline logic. Each ffprobe call adds detail; the orchestration layer makes decisions based on the combined info.

Performance considerations

ffprobe is fast — typical file inspection is sub-second. For pipelines processing many files:

Per-file ffprobe ~50-200 ms wall time.
For 1000 files: ~1-3 minutes total.
For 1M files: parallelize.

For production at scale, parallelize ffprobe calls across workers. The work is per-file; trivially parallelizable.

For very large files (multi-hour content), -show_packets or -show_frames can be slow because they iterate through the whole file. Use them only when needed; for stream-level info, format and stream entries are fast.

Container-specific notes

MP4/MOV/CMAF: ffprobe is optimized for these; fast inspection.

MPEG-TS: requires more analysis to determine structure (which PIDs carry what); slower than MP4 inspection but still sub-second typically.

HLS playlists: ffprobe parses the m3u8 and probes the first segment for stream characteristics. For inspecting an entire HLS deployment, multiple ffprobe calls (one per variant) are needed.

DASH MPDs: similar to HLS — parse manifest, probe first segment.

MKV/WebM: well-supported; fast inspection.

Specialized formats (MXF, AVI, FLV): supported but may be slower or have edge cases.

Pipeline integration patterns

Pattern 1: Auto-detect source characteristics:

# Determine encoding parameters based on source
codec=$(ffprobe -v error -select_streams v:0 -show_entries stream=codec_name -of default=noprint_wrappers=1:nokey=1 source.mp4)
resolution=$(ffprobe -v error -select_streams v:0 -show_entries stream=width,height -of csv=s=x:p=0 source.mp4)
fps=$(ffprobe -v error -select_streams v:0 -show_entries stream=r_frame_rate -of default=noprint_wrappers=1:nokey=1 source.mp4)

echo "Source: $codec at $resolution, $fps"
# Use these values to configure downstream encoding

Pattern 2: Validate ingest content:

# Verify ingest meets requirements
height=$(ffprobe -v error -select_streams v:0 -show_entries stream=height -of csv=p=0 ingest.mp4)
if [ "$height" -lt 720 ]; then
    echo "Source resolution too low for premium encoding"
    exit 1
fi

Pattern 3: Detect HDR pipeline routing:

transfer=$(ffprobe -v error -select_streams v:0 -show_entries stream=color_transfer -of default=noprint_wrappers=1:nokey=1 source.mp4)
if [ "$transfer" = "smpte2084" ]; then
    pipeline=hdr
else
    pipeline=sdr
fi

For pipeline automation, ffprobe results drive routing decisions. Different content types, codecs, or characteristics route to different encoding configurations.

Operational considerations

Things that matter for ffprobe in pipelines:

Version pinning — ffprobe output format can change across versions. Pin version for parsers.
Error handling — corrupt files produce ffprobe errors. Handle gracefully.
Caching — for re-processed content, cache ffprobe results.
Parallelization — for high-volume pipelines, parallel ffprobe calls.
Network access — probing remote files (HTTP, S3) requires network access; budget accordingly.

ffprobe output format options

ffprobe supports multiple output formats:

default — human-readable text. Good for terminal use.
json — structured JSON. Standard for scripting.
csv — comma-separated. Spreadsheet-friendly.
flat — flat key=value lines. Easy to grep.
xml — XML format. Less common.

For pipeline integration, json is the standard. CSV is occasionally useful for one-off analysis or reporting.

The -print_format flag selects format; the alias -of is identical:

ffprobe -of json input.mp4 -show_format
ffprobe -of csv input.mp4 -show_format
ffprobe -of xml input.mp4 -show_format

For pipeline automation, settle on JSON and stick with it. Mixing formats produces brittle parsing.

Common ffprobe gotchas

Gotcha 1: Stream selection.

-select_streams v:0 selects the first video stream. v alone selects all video streams. 0:0 selects stream index 0 of input 0. Get the syntax right or you'll inspect the wrong stream.

Gotcha 2: r_frame_rate vs avg_frame_rate.

For VFR content, r_frame_rate (declared rate) and avg_frame_rate (actual average) differ. For CFR content, they're equal. Check both for VFR detection.

Gotcha 3: Bit rate at format vs stream level.

format.bit_rate is total file bitrate; stream.bit_rate is per-stream. They differ when there are multiple streams. Use format.bit_rate for total; sum stream.bit_rate values for verification.

Gotcha 4: Duration in different fields.

Container duration may differ from stream duration. Use format.duration for total file length; stream.duration for individual stream length.

Gotcha 5: Color metadata absence.

If color_primaries, color_transfer, color_space are absent, the source has no signaling. Some players assume Rec.709 defaults; others assume nothing. Don't infer; check what's signaled.

What MpegFlow does with ffprobe

MpegFlow's DAG runtime expresses source inspection through FfprobeExecutor — one of three first-party StageExecutor implementations alongside FfmpegExecutor and CcextractorExecutor. The executor proto field on each stage tells the ExecutorRegistry which binary to dispatch. The partitioner persists the probe stage to job_stages; cross-stage data flow wires its structured output into downstream stages so encoding and packaging parameters can be derived from real source characteristics rather than assumed defaults.

The probe-driven routing pattern is what makes per-title encoding chains work in MpegFlow today: source resolution, frame rate, codec, HDR signaling, and audio characteristics flow from the FfprobeExecutor stage into the partitioner's downstream-stage parameter assembly. Per-stage retry handles transient failures; exit-code classification distinguishes retryable conditions from terminal source-format problems.

For customers ingesting heterogeneous content (different sources, different encoders, different formats), the FfprobeExecutor stage eliminates manual per-asset configuration — the workflow expresses "probe and route", and the runtime materializes the right downstream stages.

The strict-broker security model handles FfprobeExecutor work like any pipeline payload — workers carry no ambient credentials; content access flows through short-lived presigned URLs scoped per stage; access is disposed on completion.

The general guidance: ffprobe is the foundation of automated pipeline behavior. Master it; use it everywhere; don't try to handle media files without first knowing what's in them. Pipelines that skip ffprobe inspection are pipelines waiting to break on the first unexpected content type. The few minutes spent learning ffprobe's flag set pay off across years of pipeline operation.

What ffprobe is

ffprobe is a media inspection tool bundled with ffmpeg. Same libraries (libavformat, libavcodec); different focus. Where ffmpeg processes media, ffprobe describes it.

ffprobe outputs:

Format-level metadata (container type, duration, bitrate, etc.).
Per-stream metadata (codec, resolution, frame rate, etc.).
Per-frame data (pts, dts, type, size).
Per-packet data (size, position, timing).

For pipeline automation, ffprobe is the foundation — every "decide encoding parameters based on source" pipeline starts with ffprobe.

Basic invocations

Quick stream summary:

ffprobe input.mp4

Output (abbreviated):

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
  Duration: 00:01:30.00, start: 0.000000, bitrate: 5000 kb/s
  Stream #0:0(eng): Video: h264 (High), yuv420p, 1920x1080, 4500 kb/s, 25 fps
  Stream #0:1(eng): Audio: aac (LC), 48000 Hz, stereo, fltp, 192 kb/s

For scripting, structured output is better:

ffprobe -v error -show_format -show_streams -print_format json input.mp4

Output is JSON with format-level and per-stream sections. Easy to parse programmatically.

Common inspection patterns

Get codec name for video stream:

ffprobe -v error -select_streams v:0 -show_entries stream=codec_name -of default=noprint_wrappers=1:nokey=1 input.mp4

Output: h264

Get resolution:

ffprobe -v error -select_streams v:0 -show_entries stream=width,height -of csv=s=x:p=0 input.mp4

Output: 1920x1080

Get duration in seconds:

ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 input.mp4

Output: 90.000000

Get bitrate:

ffprobe -v error -show_entries stream=bit_rate -select_streams v:0 -of default=noprint_wrappers=1:nokey=1 input.mp4

Output: 4500000 (bits per second)

Check HDR signaling:

ffprobe -v error -select_streams v:0 -show_entries stream=color_primaries,color_transfer,color_space -of json input.mp4

Output:

{
  "streams": [{
    "color_primaries": "bt2020",
    "color_transfer": "smpte2084",
    "color_space": "bt2020nc"
  }]
}

This is HDR10 (PQ + BT.2020). For SDR: bt709/bt709/bt709.

Frame count:

ffprobe -v error -count_packets -select_streams v:0 -show_entries stream=nb_read_packets -of csv=p=0 input.mp4

Output: 2250 (packet count = roughly frame count for video).

JSON for scripting

For pipeline automation, JSON output is the standard:

ffprobe -v error -print_format json -show_format -show_streams input.mp4 > probe.json

Then parse with jq, Python, or any JSON tool:

import json
import subprocess

result = subprocess.run(
    ['ffprobe', '-v', 'error', '-print_format', 'json', '-show_format', '-show_streams', 'input.mp4'],
    capture_output=True, text=True
)
data = json.loads(result.stdout)

video_stream = next(s for s in data['streams'] if s['codec_type'] == 'video')
print(f"Codec: {video_stream['codec_name']}")
print(f"Resolution: {video_stream['width']}x{video_stream['height']}")
print(f"Frame rate: {video_stream['r_frame_rate']}")
print(f"Pixel format: {video_stream['pix_fmt']}")

For pipeline ingest stages, this kind of probing is the first thing that happens. Drives downstream encoding decisions.

Frame and packet inspection

For deeper analysis:

Per-frame info:

ffprobe -v error -select_streams v:0 -show_frames -of json input.mp4

Output (abbreviated):

{
  "frames": [
    {
      "media_type": "video",
      "key_frame": 1,
      "pkt_pts_time": "0.000000",
      "pict_type": "I",
      "coded_picture_number": 0,
      ...
    }
  ]
}

Useful for verifying GOP structure (where keyframes are), checking frame types, debugging timing.

Per-packet info:

ffprobe -v error -select_streams v:0 -show_packets -of json input.mp4

Outputs per-packet data: pts, dts, size, position. Useful for:

Verifying segment boundaries align with keyframes.
Debugging streaming pipelines.
Computing per-segment bitrate from packet sizes.

Probing live streams

ffprobe works on streams, not just files:

ffprobe -v error -i 'srt://host:port?streamid=stream1'
ffprobe -v error -i 'http://example.com/playlist.m3u8'
ffprobe -v error -i 'rtmp://example.com/live/stream'

For live streams, ffprobe captures a snapshot of stream characteristics. Useful for:

Verifying live stream is producing content.
Detecting source stream issues (wrong codec, wrong resolution).
Pipeline orchestration (decide downstream config based on live source).

Complex inspection scenarios

Detect HDR vs SDR:

def is_hdr(probe_data):
    video = next(s for s in probe_data['streams'] if s['codec_type'] == 'video')
    return video.get('color_transfer') in ['smpte2084', 'arib-std-b67']

Detect interlaced vs progressive:

def is_interlaced(probe_data):
    video = next(s for s in probe_data['streams'] if s['codec_type'] == 'video')
    return video.get('field_order') in ['tt', 'bb', 'tb', 'bt']

Detect encryption:

def is_encrypted(probe_data):
    for stream in probe_data['streams']:
        if 'encryption' in stream.get('disposition', {}):
            return True
    return False

These primitives compose into pipeline logic. Each ffprobe call adds detail; the orchestration layer makes decisions based on the combined info.

Performance considerations

ffprobe is fast — typical file inspection is sub-second. For pipelines processing many files:

Per-file ffprobe ~50-200 ms wall time.
For 1000 files: ~1-3 minutes total.
For 1M files: parallelize.

For production at scale, parallelize ffprobe calls across workers. The work is per-file; trivially parallelizable.

Container-specific notes

MP4/MOV/CMAF: ffprobe is optimized for these; fast inspection.

MPEG-TS: requires more analysis to determine structure (which PIDs carry what); slower than MP4 inspection but still sub-second typically.

HLS playlists: ffprobe parses the m3u8 and probes the first segment for stream characteristics. For inspecting an entire HLS deployment, multiple ffprobe calls (one per variant) are needed.

DASH MPDs: similar to HLS — parse manifest, probe first segment.

MKV/WebM: well-supported; fast inspection.

Specialized formats (MXF, AVI, FLV): supported but may be slower or have edge cases.

Pipeline integration patterns

Pattern 1: Auto-detect source characteristics:

# Determine encoding parameters based on source
codec=$(ffprobe -v error -select_streams v:0 -show_entries stream=codec_name -of default=noprint_wrappers=1:nokey=1 source.mp4)
resolution=$(ffprobe -v error -select_streams v:0 -show_entries stream=width,height -of csv=s=x:p=0 source.mp4)
fps=$(ffprobe -v error -select_streams v:0 -show_entries stream=r_frame_rate -of default=noprint_wrappers=1:nokey=1 source.mp4)

echo "Source: $codec at $resolution, $fps"
# Use these values to configure downstream encoding

Pattern 2: Validate ingest content:

# Verify ingest meets requirements
height=$(ffprobe -v error -select_streams v:0 -show_entries stream=height -of csv=p=0 ingest.mp4)
if [ "$height" -lt 720 ]; then
    echo "Source resolution too low for premium encoding"
    exit 1
fi

Pattern 3: Detect HDR pipeline routing:

transfer=$(ffprobe -v error -select_streams v:0 -show_entries stream=color_transfer -of default=noprint_wrappers=1:nokey=1 source.mp4)
if [ "$transfer" = "smpte2084" ]; then
    pipeline=hdr
else
    pipeline=sdr
fi

For pipeline automation, ffprobe results drive routing decisions. Different content types, codecs, or characteristics route to different encoding configurations.

Operational considerations

Things that matter for ffprobe in pipelines:

Version pinning — ffprobe output format can change across versions. Pin version for parsers.
Error handling — corrupt files produce ffprobe errors. Handle gracefully.
Caching — for re-processed content, cache ffprobe results.
Parallelization — for high-volume pipelines, parallel ffprobe calls.
Network access — probing remote files (HTTP, S3) requires network access; budget accordingly.

ffprobe output format options

ffprobe supports multiple output formats:

default — human-readable text. Good for terminal use.
json — structured JSON. Standard for scripting.
csv — comma-separated. Spreadsheet-friendly.
flat — flat key=value lines. Easy to grep.
xml — XML format. Less common.

For pipeline integration, json is the standard. CSV is occasionally useful for one-off analysis or reporting.

The -print_format flag selects format; the alias -of is identical:

ffprobe -of json input.mp4 -show_format
ffprobe -of csv input.mp4 -show_format
ffprobe -of xml input.mp4 -show_format

For pipeline automation, settle on JSON and stick with it. Mixing formats produces brittle parsing.

Common ffprobe gotchas

Gotcha 1: Stream selection.

-select_streams v:0 selects the first video stream. v alone selects all video streams. 0:0 selects stream index 0 of input 0. Get the syntax right or you'll inspect the wrong stream.

Gotcha 2: r_frame_rate vs avg_frame_rate.

For VFR content, r_frame_rate (declared rate) and avg_frame_rate (actual average) differ. For CFR content, they're equal. Check both for VFR detection.

Gotcha 3: Bit rate at format vs stream level.

format.bit_rate is total file bitrate; stream.bit_rate is per-stream. They differ when there are multiple streams. Use format.bit_rate for total; sum stream.bit_rate values for verification.

Gotcha 4: Duration in different fields.

Container duration may differ from stream duration. Use format.duration for total file length; stream.duration for individual stream length.

Gotcha 5: Color metadata absence.

If color_primaries, color_transfer, color_space are absent, the source has no signaling. Some players assume Rec.709 defaults; others assume nothing. Don't infer; check what's signaled.

ffprobe stream inspection — extracting media info for pipeline automation

What ffprobe is

Basic invocations

Common inspection patterns

JSON for scripting

Frame and packet inspection

Probing live streams

Complex inspection scenarios

Performance considerations

Container-specific notes

Pipeline integration patterns

Operational considerations

ffprobe output format options

Common ffprobe gotchas

What MpegFlow does with ffprobe

Related topics and reading

ffprobe stream inspection — extracting media info for pipeline automation

What ffprobe is

Basic invocations

Common inspection patterns

JSON for scripting

Frame and packet inspection

Probing live streams

Complex inspection scenarios

Performance considerations

Container-specific notes

Pipeline integration patterns

Operational considerations

ffprobe output format options

Common ffprobe gotchas

What MpegFlow does with ffprobe

Related topics and reading

ffprobe stream inspection — extracting media info for pipeline automation

#What ffprobe is

#Basic invocations

#Common inspection patterns

#JSON for scripting

#Frame and packet inspection

#Probing live streams

#Complex inspection scenarios

#Performance considerations

#Container-specific notes

#Pipeline integration patterns

#Operational considerations

#ffprobe output format options

#Common ffprobe gotchas

#What MpegFlow does with ffprobe

Related topics and reading

ffprobe stream inspection — extracting media info for pipeline automation

#What ffprobe is

#Basic invocations

#Common inspection patterns

#JSON for scripting

#Frame and packet inspection

#Probing live streams

#Complex inspection scenarios

#Performance considerations

#Container-specific notes

#Pipeline integration patterns

#Operational considerations

#ffprobe output format options

#Common ffprobe gotchas

#What MpegFlow does with ffprobe

Related topics and reading

What ffprobe is

Basic invocations

Common inspection patterns

JSON for scripting

Frame and packet inspection

Probing live streams

Complex inspection scenarios

Performance considerations

Container-specific notes

Pipeline integration patterns

Operational considerations

ffprobe output format options

Common ffprobe gotchas

What MpegFlow does with ffprobe

What ffprobe is

Basic invocations

Common inspection patterns

JSON for scripting

Frame and packet inspection

Probing live streams

Complex inspection scenarios

Performance considerations

Container-specific notes

Pipeline integration patterns

Operational considerations

ffprobe output format options

Common ffprobe gotchas

What MpegFlow does with ffprobe