MpegFlow with Redis: queues, distributed locks, real-time state
How MpegFlow uses Redis for queues (job dispatch), distributed locks (operator coordination), and real-time state (worker heartbeats). The HA patterns that survive failover events.
Redis is MpegFlow's queue + ephemeral-state store — job dispatch queues, distributed locks for operator leader election, worker heartbeat tracking, and rate-limit counters. We picked Redis for the predictable performance, the well-understood HA story, and the broad ecosystem (Redis Cluster, Redis Sentinel, managed Redis on every cloud).
How the integration works
Single logical Redis (HA-deployed) per MpegFlow deployment. Lists hold per-pool job queues. Sorted sets handle priority queues + delayed retries. Strings store rate-limit counters with TTL-based expiry. Pub/Sub channels broadcast events to all workers. Lua scripts ensure atomic queue operations (popping from queue + setting "in-flight" state in one round-trip).
Common patterns
Managed Redis for HA
AWS ElastiCache, Google Memorystore, or Redis Cloud for production deployments. Multi-AZ with automatic failover; replication keeps the secondary fresh. Don't run stateful Redis in K8s for the same reason as Postgres.
Redis Cluster for horizontal scale
For deployments above ~10M jobs/month, Redis Cluster shards keys across multiple nodes. Pool keys hash to specific shards via consistent hashing. The trade-off: Lua scripts must operate on keys in the same hash slot — design queue operations accordingly.
Persistence vs ephemeral mode
Redis can be ephemeral (RAM only, fast, lossy on restart) or persistent (AOF + RDB snapshots, slightly slower, durable). MpegFlow uses persistent mode in production — losing in-flight jobs on Redis restart is unacceptable. Pair with PostgreSQL as the source of truth for job state recovery.
Redis Pub/Sub for control plane events
Worker capacity changes, pool pauses, and operator leader transitions broadcast via Pub/Sub. Workers subscribe at startup; the operator publishes events. Pub/Sub is fire-and-forget — for guaranteed delivery, pair with the Postgres event table.
Pitfalls
- Redis is single-threaded for command execution. Long-running Lua scripts block all other commands. Keep scripts short and well-profiled.
- Redis Cluster requires careful key design — multi-key operations must hash to the same slot. Use hash tags ({pool-id}:queue, {pool-id}:lock) to force colocation.
- Memory limits: Redis evicts based on maxmemory-policy when it hits limits. For job queues, allkeys-lru is wrong (would evict in-flight jobs). Use noeviction and monitor memory pressure aggressively.
- Replication lag during failover can lose recent writes. Use WAIT command to acknowledge replication for critical writes (e.g., job-completion ACKs).
- Pub/Sub is fire-and-forget — no guarantee subscribers receive the message. For critical events, use a Postgres-backed event table with workers polling.
At production scale
Redis at MpegFlow production scale handles 100K-500K commands/sec on a single primary (cache.r6g.xlarge equivalent). Memory consumption is dominated by sorted-set entries for delayed retries — at 1M scheduled retries, expect ~500MB of memory. Above ~10M jobs/month or 5GB working set, move to Redis Cluster. Below that, single-primary HA Redis is operationally simpler.
- redis
- queue
- database
- integration