PSNR — Peak Signal-to-Noise Ratio — is the oldest automated video quality metric in widespread use. It dates from the early days of digital signal processing, well before video streaming was a concern. Despite well-documented limitations as a perceptual quality predictor, it persists in 2026 as the metric most commonly cited in codec research, the most universally understood reference in encoder benchmarking, and a useful diagnostic tool for specific kinds of encoder analysis. This page is the engineering reference for what PSNR actually measures, why it stuck around, and when it's still the right tool.
What PSNR is
PSNR measures how close one image is to another, on a logarithmic scale based on pixel-level squared error. The formula:
MSE = mean of (reference_pixel - distorted_pixel)^2 across all pixels
PSNR = 10 * log10(MAX^2 / MSE)
Where MAX is the maximum possible pixel value (255 for 8-bit content, 1023 for 10-bit, etc.). Higher PSNR = lower error = closer to the reference.
For 8-bit video with values 0-255, the practical PSNR ranges:
- PSNR 50+ dB — visually indistinguishable from source on consumer playback. Beyond the threshold of perceptible difference.
- PSNR 40-50 dB — high quality. Visible difference only on careful expert review.
- PSNR 35-40 dB — acceptable quality for most consumer use. Visible degradation on inspection.
- PSNR 30-35 dB — visible quality issues. Mid-tier streaming.
- PSNR 25-30 dB — significant degradation. Lower-tier streaming.
- PSNR below 25 dB — heavily degraded. Emergency fallback territory.
The dB scale is logarithmic — a 3 dB increase represents a doubling of perceptual fidelity (in SNR terms). A 6 dB increase represents 4x. This means small numerical PSNR differences represent meaningful quality differences.
How PSNR is computed in practice
Per-frame PSNR is computed for each color plane (Y, U, V) separately:
- PSNR-Y — luma channel only. Often the most-cited single metric.
- PSNR-U / PSNR-V — chroma channels.
- PSNR-YUV — typically a weighted average (6:1:1 for 4:2:0 chroma sampling, reflecting the bit budget for each plane).
For a video sequence, per-frame PSNR is aggregated across all frames. Several aggregation methods:
- Arithmetic mean — simple average. Most common in encoder benchmarks.
- Per-frame minimum — worst-case quality. Useful for identifying frame-level outliers.
- Per-frame standard deviation — quality variance across the sequence.
ffmpeg's PSNR filter computes this:
ffmpeg -i reference.mp4 -i distorted.mp4 -lavfi psnr -f null -
Output:
PSNR y:39.821 u:42.109 v:42.456 average:40.421 min:30.123 max:50.892
The average: value is the arithmetic mean per-frame PSNR, weighted by the chroma sampling.
Why PSNR persists
Given that PSNR has been criticized as a perceptual metric for decades, why is it still the most-cited metric in codec literature? Several reasons:
Computational cost — PSNR is essentially free. It's a single squared-error computation per pixel, easily parallelizable, well-understood. SSIM is more expensive; VMAF is much more expensive.
Mathematical simplicity — the formula is one line. The behavior is predictable. There are no trained models, no parameter tuning, no implementation variants. PSNR-Y on the same content should produce the same number across implementations.
Historical comparability — every codec paper from 1990 to today reports PSNR numbers. When you want to compare a new codec proposal to historical baselines, PSNR is the lingua franca.
Encoder optimization target — many codecs are explicitly designed to maximize PSNR (or minimize MSE). Their rate-distortion optimization uses MSE-based metrics internally. Reporting PSNR shows the encoder doing what it was designed to do.
Standards body conventions — ITU-T, ISO/IEC, and IEEE codec evaluation procedures historically used PSNR. The reporting conventions trickle down through industry.
So PSNR persists not because it's the best perceptual metric — it isn't — but because it's the most universal one. Replacing it requires industry coordination that hasn't fully happened.
Where PSNR fails as a perceptual metric
The well-documented limitations:
Doesn't account for content type — PSNR weights every pixel equally, regardless of perceptual importance. A 1 dB drop on a flat sky is treated the same as a 1 dB drop on a face. Humans care much more about the face.
Doesn't account for masking — high-detail or high-motion regions can hide errors that would be visible in flat regions. PSNR doesn't model this; perceptual metrics like SSIM and VMAF do.
Misleads on perceptual codec optimization — codec features that improve perception (e.g., AV1's film grain synthesis, which strips grain and re-synthesizes it) often reduce PSNR because the synthesized grain doesn't match the original pixel-for-pixel. The viewer sees better quality; PSNR says worse.
Ignores temporal artifacts — flicker, judder, motion artifacts. PSNR is computed per-frame; temporal coherence isn't measured.
Color-space sensitivity — PSNR in the encoded YUV space doesn't directly measure quality in the displayed RGB space. Chroma sub-sampling further complicates this.
Doesn't generalize across resolutions — same numerical PSNR at 1080p and 4K represent very different perceptual experiences. Smaller pixels at 4K need lower PSNR to look the same as a 1080p stream.
The practical implication: a codec or encoder configuration that wins on PSNR might lose on perceptual quality, and vice versa. Codec researchers and pipeline engineers have known this for 30 years; the metric persists because the ecosystem is built around it.
Where PSNR is still the right tool
Despite the limitations, PSNR has continuing legitimate uses:
Encoder regression testing — when you're comparing two versions of the same encoder (e.g., x264 v0.164 vs v0.165), PSNR is a fast, deterministic, sensitive metric for detecting quality regressions. The ground truth is "this new version does better/worse on PSNR than the old one"; perceptual differences are usually small enough that PSNR captures them.
Lossless / near-lossless validation — for content above ~50 dB PSNR, all the metrics agree the content is essentially perfect. PSNR is a reliable threshold-detector for "is this still effectively lossless?"
Rate-distortion analysis at the codec level — when designing or tuning rate-control algorithms, PSNR is the metric the codec's rate-distortion optimization uses. Reporting PSNR shows what the codec was actually optimizing.
Cross-paper comparison in codec research — every paper reports PSNR. To compare a new technique against the literature, you need PSNR numbers. The other metrics are supplementary.
Component-level analysis — when you want to understand which color plane is causing quality issues, PSNR-Y vs PSNR-U vs PSNR-V is informative. A drop in PSNR-U/V vs PSNR-Y suggests chroma compression is too aggressive.
BD-rate PSNR
Bjontegaard delta-rate (BD-rate) is the standard way to express "codec A is X% more efficient than codec B at equivalent quality." The procedure:
- Encode the same source at multiple bitrates with codec A.
- Encode the same source at multiple bitrates with codec B.
- Compute PSNR for each encoded version vs source.
- Plot bitrate (log scale) vs PSNR for both codecs.
- Compute the area between the curves; this is the BD-rate.
A negative BD-rate PSNR (e.g., "codec B has -25% BD-rate PSNR vs codec A") means codec B achieves the same PSNR at 25% lower bitrate than codec A.
BD-rate PSNR is the most-cited single number in codec comparison literature. AV1 vs HEVC, HEVC vs H.264, VVC vs HEVC — all of these are typically expressed as BD-rate PSNR. The numbers reported are usually the BD-rate at PSNR-Y (luma only).
PSNR alternatives in 2026
Modern codec evaluation typically reports multiple metrics:
- PSNR-Y — historical comparability, fast, well-understood.
- SSIM — better correlation with perception than PSNR, modest extra cost.
- VMAF — best correlation with perception, higher cost, increasingly the production decision metric.
For codec research papers, all three are typically reported. For production decisions, VMAF leads with PSNR as a sanity check. For pipeline regression testing, PSNR is often sufficient because the changes you're trying to detect are typically large enough to register in any metric.
The argument for "just use VMAF and skip PSNR" exists but isn't universal — PSNR remains useful precisely because it measures something different. When a codec change improves PSNR but degrades VMAF, that's a meaningful signal worth investigating; it suggests the change is doing something perceptually questionable that the trained metric is detecting.
What MpegFlow does with PSNR
PSNR runs as a discrete measurement stage in MpegFlow's DAG runtime via the FFmpeg psnr filter, exposed through the quality-analysis node alongside VMAF and SSIM. The stage executes on an FfmpegExecutor worker; cross-stage data flow wires the encode output and reference source into measurement input. Per-stage retry handles transient failures; results land in the workflow's metadata storage.
Customers running quality analysis on their content can configure all three metrics in the same workflow (PSNR, SSIM, VMAF as parallel measurement stages), allowing comparison and identification of cases where metrics disagree (often worth attention).
For internal encoder regression testing — "did the SVT-AV1 update we deployed change quality on customer content?" — PSNR is the fast first-pass detector. We compute PSNR on a representative content corpus on every encoder version update and flag content where PSNR moves more than 0.5 dB. Cases that pass PSNR regression testing then go through VMAF analysis for perceptual confirmation.
The strict-broker security model treats PSNR computation like any other workflow stage — workers receive content via short-lived presigned URLs, compute the metric, write results to customer-controlled metadata storage, and dispose of content access on completion. Quality measurement isn't sensitive in the same way DRM keys are, but the discipline is consistent across pipeline stages.
For customers building their own quality programs, our standing recommendation is to compute all three metrics, use VMAF for production decisions, use PSNR for fast regression sanity checks, and use SSIM where computation budget makes VMAF prohibitive but PSNR feels too perceptually limited. The metrics are complementary; using one without the others leaves information on the table.