MpegFlow with AWS S3: video transcoding architecture

How MpegFlow integrates with AWS S3 — presigned URL pattern, multi-region replication, lifecycle policies, and the IAM-zero strict-broker pattern for production video pipelines.

Stack integration · AWS S3·AWS S3 ↗

AWS S3 is the most common object storage choice for video infrastructure — mezzanine assets, encoded outputs, archive footage, and HLS/DASH packaged segments all typically live in S3. The integration question is operational: does the encoder pool need IAM credentials to read S3, and what happens when one of those credentials gets compromised? MpegFlow's approach is the strict-broker pattern: workers receive presigned URLs for specific objects with one-hour TTLs; they hold zero IAM credentials.

How the integration works

MpegFlow's coordinator generates per-job presigned URLs for both the input mezzanine (read) and output destinations (write). Workers receive these URLs through the job spec. They use AWS's standard S3 SDK or curl against the presigned URLs — no AWS credential chain involved. This means a compromised worker can only read the in-flight job's mezzanine and write to its specific output prefix; there's no path to other tenants' buckets, no IAM role to leverage, no cross-customer access.

Common patterns

Multi-region with cross-region replication
For multi-region deployments, point MpegFlow's coordinator at primary buckets in each region. S3 Cross-Region Replication keeps mezzanine assets synced; encoded outputs follow regional placement based on which region's encoder pool processed them. Pair with the multi-region failover architecture for active-active deployments.
Intelligent-Tiering for archive workloads
For large archive migrations, S3 Intelligent-Tiering (or explicit Glacier transitions via lifecycle policies) drops cold-storage costs dramatically. The petabyte archive migration architecture documents the deferred-retrieval patterns that make this affordable.
Per-customer prefix isolation
For multi-tenant deployments, customers have dedicated bucket prefixes (e.g., s3://customer-bucket/cust-acme/). MpegFlow's coordinator generates presigned URLs scoped to specific prefixes; workers cannot reach other customer prefixes even if compromised.
Atomic upload via temp prefixes
Workers write to a temp prefix (s3://bucket/tmp/job-123/) during encode; on success the coordinator renames (S3 copy + delete) to the final prefix. Failed jobs leave temp objects that S3 lifecycle policies clean up automatically. Customers never see partial outputs.

Pitfalls

Presigned URL TTL too long: 24-hour TTLs sound convenient but extend the blast radius of a leaked URL. Stick to 1-hour TTLs and renew via coordinator if jobs run longer.
S3 PUT is eventually consistent for some operations: rapid GET-after-PUT can occasionally return 404. Build retries into the worker's download logic.
Cross-region GET costs can surprise: if your encoder pool is in us-east-1 but the bucket is in eu-west-1, every encode pays cross-region transfer. Place encoder pools regionally close to source data.
IAM "BucketOwnerEnforced" mode disables ACLs entirely, which most newer S3 buckets default to. Ensure presigned URL generation works against this setting (it does, but old code paths sometimes assume ACLs).
S3 throughput limits per prefix: ~5,500 GET / 3,500 PUT per second per prefix. At extreme scale (>10K parallel encodes against one customer prefix), distribute across prefixes via hashing.

At production scale

AWS S3 scales effectively without limit for video workloads, but two cost axes matter at scale. Egress out of AWS (to your CDN or self-hosted edge) can dwarf storage cost above ~10TB/day; pair with CloudFront or another AWS-resident CDN to keep traffic in-region. PUT and GET request counts add up at high job volume — at 10M jobs/month with 5 PUTs each (master, 4 renditions), that's 50M PUTs costing ~$250 in request fees alone.

Topics

AWS
s3
storage
integration
Security

MpegFlow with AWS S3: video transcoding architecture

How MpegFlow integrates with AWS S3 — presigned URL pattern, multi-region replication, lifecycle policies, and the IAM-zero strict-broker pattern for production video pipelines.

Stack integration · AWS S3·AWS S3 ↗

How the integration works

Common patterns

Multi-region with cross-region replication
For multi-region deployments, point MpegFlow's coordinator at primary buckets in each region. S3 Cross-Region Replication keeps mezzanine assets synced; encoded outputs follow regional placement based on which region's encoder pool processed them. Pair with the multi-region failover architecture for active-active deployments.
Intelligent-Tiering for archive workloads
For large archive migrations, S3 Intelligent-Tiering (or explicit Glacier transitions via lifecycle policies) drops cold-storage costs dramatically. The petabyte archive migration architecture documents the deferred-retrieval patterns that make this affordable.
Per-customer prefix isolation
For multi-tenant deployments, customers have dedicated bucket prefixes (e.g., s3://customer-bucket/cust-acme/). MpegFlow's coordinator generates presigned URLs scoped to specific prefixes; workers cannot reach other customer prefixes even if compromised.
Atomic upload via temp prefixes
Workers write to a temp prefix (s3://bucket/tmp/job-123/) during encode; on success the coordinator renames (S3 copy + delete) to the final prefix. Failed jobs leave temp objects that S3 lifecycle policies clean up automatically. Customers never see partial outputs.

Pitfalls

Presigned URL TTL too long: 24-hour TTLs sound convenient but extend the blast radius of a leaked URL. Stick to 1-hour TTLs and renew via coordinator if jobs run longer.
S3 PUT is eventually consistent for some operations: rapid GET-after-PUT can occasionally return 404. Build retries into the worker's download logic.
Cross-region GET costs can surprise: if your encoder pool is in us-east-1 but the bucket is in eu-west-1, every encode pays cross-region transfer. Place encoder pools regionally close to source data.
IAM "BucketOwnerEnforced" mode disables ACLs entirely, which most newer S3 buckets default to. Ensure presigned URL generation works against this setting (it does, but old code paths sometimes assume ACLs).
S3 throughput limits per prefix: ~5,500 GET / 3,500 PUT per second per prefix. At extreme scale (>10K parallel encodes against one customer prefix), distribute across prefixes via hashing.

At production scale

Topics

AWS
s3
storage
integration
Security

MpegFlow with AWS S3: video transcoding architecture

How the integration works

Common patterns

Multi-region with cross-region replication

Intelligent-Tiering for archive workloads

Per-customer prefix isolation

Atomic upload via temp prefixes

Pitfalls

At production scale

MpegFlow with AWS S3: video transcoding architecture

How the integration works

Common patterns

Multi-region with cross-region replication

Intelligent-Tiering for archive workloads

Per-customer prefix isolation

Atomic upload via temp prefixes

Pitfalls

At production scale