MpegFlow with Redis: queues, distributed locks, real-time state

How MpegFlow uses Redis for queues (job dispatch), distributed locks (operator coordination), and real-time state (worker heartbeats). The HA patterns that survive failover events.

Stack integration · Redis·Redis ↗

Redis is MpegFlow's queue + ephemeral-state store — job dispatch queues, distributed locks for operator leader election, worker heartbeat tracking, and rate-limit counters. We picked Redis for the predictable performance, the well-understood HA story, and the broad ecosystem (Redis Cluster, Redis Sentinel, managed Redis on every cloud).

How the integration works

Single logical Redis (HA-deployed) per MpegFlow deployment. Lists hold per-pool job queues. Sorted sets handle priority queues + delayed retries. Strings store rate-limit counters with TTL-based expiry. Pub/Sub channels broadcast events to all workers. Lua scripts ensure atomic queue operations (popping from queue + setting "in-flight" state in one round-trip).

Common patterns

Managed Redis for HA
AWS ElastiCache, Google Memorystore, or Redis Cloud for production deployments. Multi-AZ with automatic failover; replication keeps the secondary fresh. Don't run stateful Redis in K8s for the same reason as Postgres.
Redis Cluster for horizontal scale
For deployments above ~10M jobs/month, Redis Cluster shards keys across multiple nodes. Pool keys hash to specific shards via consistent hashing. The trade-off: Lua scripts must operate on keys in the same hash slot — design queue operations accordingly.
Persistence vs ephemeral mode
Redis can be ephemeral (RAM only, fast, lossy on restart) or persistent (AOF + RDB snapshots, slightly slower, durable). MpegFlow uses persistent mode in production — losing in-flight jobs on Redis restart is unacceptable. Pair with PostgreSQL as the source of truth for job state recovery.
Redis Pub/Sub for control plane events
Worker capacity changes, pool pauses, and operator leader transitions broadcast via Pub/Sub. Workers subscribe at startup; the operator publishes events. Pub/Sub is fire-and-forget — for guaranteed delivery, pair with the Postgres event table.

Pitfalls

Redis is single-threaded for command execution. Long-running Lua scripts block all other commands. Keep scripts short and well-profiled.
Redis Cluster requires careful key design — multi-key operations must hash to the same slot. Use hash tags ({pool-id}:queue, {pool-id}:lock) to force colocation.
Memory limits: Redis evicts based on maxmemory-policy when it hits limits. For job queues, allkeys-lru is wrong (would evict in-flight jobs). Use noeviction and monitor memory pressure aggressively.
Replication lag during failover can lose recent writes. Use WAIT command to acknowledge replication for critical writes (e.g., job-completion ACKs).
Pub/Sub is fire-and-forget — no guarantee subscribers receive the message. For critical events, use a Postgres-backed event table with workers polling.

At production scale

Redis at MpegFlow production scale handles 100K-500K commands/sec on a single primary (cache.r6g.xlarge equivalent). Memory consumption is dominated by sorted-set entries for delayed retries — at 1M scheduled retries, expect ~500MB of memory. Above ~10M jobs/month or 5GB working set, move to Redis Cluster. Below that, single-primary HA Redis is operationally simpler.

Topics

redis
queue
database
integration

MpegFlow with Redis: queues, distributed locks, real-time state

How MpegFlow uses Redis for queues (job dispatch), distributed locks (operator coordination), and real-time state (worker heartbeats). The HA patterns that survive failover events.

Stack integration · Redis·Redis ↗

How the integration works

Common patterns

Managed Redis for HA
AWS ElastiCache, Google Memorystore, or Redis Cloud for production deployments. Multi-AZ with automatic failover; replication keeps the secondary fresh. Don't run stateful Redis in K8s for the same reason as Postgres.
Redis Cluster for horizontal scale
For deployments above ~10M jobs/month, Redis Cluster shards keys across multiple nodes. Pool keys hash to specific shards via consistent hashing. The trade-off: Lua scripts must operate on keys in the same hash slot — design queue operations accordingly.
Persistence vs ephemeral mode
Redis can be ephemeral (RAM only, fast, lossy on restart) or persistent (AOF + RDB snapshots, slightly slower, durable). MpegFlow uses persistent mode in production — losing in-flight jobs on Redis restart is unacceptable. Pair with PostgreSQL as the source of truth for job state recovery.
Redis Pub/Sub for control plane events
Worker capacity changes, pool pauses, and operator leader transitions broadcast via Pub/Sub. Workers subscribe at startup; the operator publishes events. Pub/Sub is fire-and-forget — for guaranteed delivery, pair with the Postgres event table.

Pitfalls

Redis is single-threaded for command execution. Long-running Lua scripts block all other commands. Keep scripts short and well-profiled.
Redis Cluster requires careful key design — multi-key operations must hash to the same slot. Use hash tags ({pool-id}:queue, {pool-id}:lock) to force colocation.
Memory limits: Redis evicts based on maxmemory-policy when it hits limits. For job queues, allkeys-lru is wrong (would evict in-flight jobs). Use noeviction and monitor memory pressure aggressively.
Replication lag during failover can lose recent writes. Use WAIT command to acknowledge replication for critical writes (e.g., job-completion ACKs).
Pub/Sub is fire-and-forget — no guarantee subscribers receive the message. For critical events, use a Postgres-backed event table with workers polling.

At production scale

Topics

redis
queue
database
integration

MpegFlow with Redis: queues, distributed locks, real-time state

How the integration works

Common patterns

Managed Redis for HA

Redis Cluster for horizontal scale

Persistence vs ephemeral mode

Redis Pub/Sub for control plane events

Pitfalls

At production scale

MpegFlow with Redis: queues, distributed locks, real-time state

How the integration works

Common patterns

Managed Redis for HA

Redis Cluster for horizontal scale

Persistence vs ephemeral mode

Redis Pub/Sub for control plane events

Pitfalls

At production scale