MPR - Media Processor

Media transcoding platform with three execution modes: local (Celery + MinIO), AWS (Step Functions + Lambda + S3), and GCP (Cloud Run Jobs + GCS). Storage is S3-compatible across all environments.

System Overview

Local Architecture (Development)

Open full size

AWS Architecture (Production)

Open full size

GCP Architecture (Production)

Open full size

Components

Reverse Proxy (nginx)
Application Layer (Django Admin, GraphQL API, Timeline UI)
Worker Layer (Celery local mode)
AWS (Step Functions, Lambda)
GCP (Cloud Run Jobs + GCS)
Data Layer (PostgreSQL, Redis)
S3-compatible Storage (MinIO / AWS S3 / GCS)

Data Model

Entity Relationships

Open full size

Entities

MediaAsset - Video/audio files with metadata
TranscodePreset - Encoding configurations
TranscodeJob - Processing queue items

Job Flow

Job Lifecycle

Open full size

Job States

PENDING - Waiting in queue
PROCESSING - Worker executing
COMPLETED - Success
FAILED - Error occurred
CANCELLED - User cancelled

Execution Modes

Local: Celery + MinIO (S3 API) + FFmpeg
Lambda: Step Functions + Lambda + AWS S3
GCP: Cloud Run Jobs + GCS (S3 compat)

Media Storage

MPR separates media into input and output paths, each independently configurable. File paths are stored relative to their respective root to ensure portability between local development and cloud deployments.

Input / Output Separation

MEDIA_IN - Source media files to process
MEDIA_OUT - Transcoded/trimmed output files

Why Relative Paths?

Portability: Same database works locally and in cloud
Flexibility: Easy to switch between storage backends
Simplicity: No need to update paths when migrating

Local Development

MEDIA_IN=/app/media/in
MEDIA_OUT=/app/media/out

/app/media/
├── in/                    # Source files
│   ├── video1.mp4
│   └── subfolder/video3.mp4
└── out/                   # Transcoded output
    └── video1_h264.mp4

AWS/Cloud Deployment

MEDIA_IN=s3://source-bucket/media/
MEDIA_OUT=s3://output-bucket/transcoded/
MEDIA_BASE_URL=https://source-bucket.s3.amazonaws.com/media/

Database paths remain unchanged (already relative). Just upload files to S3 and update environment variables.

Full Media Storage Documentation →

Chunker Pipeline

The chunker pipeline splits media into time-based segments, streaming real-time events from worker threads through Redis and gRPC-Web to the browser UI. 7 hops from worker thread to pixel.

Event Path

Worker thread → Pipeline._emit() → event_bridge() → Redis RPUSH
  → [50ms poll] gRPC server LRANGE → yield protobuf
  → HTTP/2 frame → Envoy (grpc-web filter)
  → HTTP/1.1 chunk → nginx (proxy_buffering off)
  → fetch ReadableStream → protobuf-ts decode
  → setEvents([...prev, evt]) → React re-render

Thread Model (inside Celery worker)

Celery worker process
  └─ run_job task thread
       └─ Pipeline.run()
            ├─ Producer thread     — enqueues chunks
            ├─ Monitor thread      — emits progress every 500ms
            ├─ Worker thread 0     — pulls from queue, processes
            ├─ Worker thread 1     — pulls from queue, processes
            ├─ Worker thread 2     — pulls from queue, processes
            └─ Worker thread 3     — pulls from queue, processes

Infrastructure

nginx :80 - Reverse proxy, static file serving
fastapi :8702 - GraphQL API (Strawberry)
celery - Task worker (runs pipeline)
redis :6379 - Event bus + Celery broker
grpc :50051 - gRPC server (StreamChunkPipeline)
envoy :8090 - gRPC-Web ↔ native gRPC translation
minio :9000 - S3-compatible source media storage
postgres :5432 - Job/asset metadata

Full Chunker Pipeline Documentation →

API (GraphQL)

All client interactions go through GraphQL at /graphql.

# GraphiQL IDE
http://mpr.local.ar/graphql

# Queries
query { assets(status: "ready") { id filename duration } }
query { jobs(status: "processing") { id status progress } }
query { presets { id name container videoCodec } }
query { systemStatus { status version } }

# Mutations
mutation { scanMediaFolder { found registered skipped } }
mutation { createJob(input: { sourceAssetId: "...", presetId: "..." }) { id status } }
mutation { cancelJob(id: "...") { id status } }
mutation { retryJob(id: "...") { id status } }
mutation { updateAsset(id: "...", input: { comments: "..." }) { id comments } }
mutation { deleteAsset(id: "...") { ok } }

# Lambda callback (REST)
POST /api/jobs/{id}/callback      - Lambda completion webhook

Supported File Types:

Video: mp4, mkv, avi, mov, webm, flv, wmv, m4v
Audio: mp3, wav, flac, aac, ogg, m4a

Access Points

# Add to /etc/hosts
127.0.0.1 mpr.local.ar

# URLs
http://mpr.local.ar/admin         - Django Admin
http://mpr.local.ar/graphql       - GraphiQL IDE
http://mpr.local.ar/              - Timeline UI
http://mpr.local.ar/chunker/      - Chunker UI
http://localhost:9001              - MinIO Console

# AWS deployment
https://mpr.mcrn.ar/              - Production

Quick Reference

# Render SVGs from DOT files
for f in docs/architecture/*.dot; do dot -Tsvg "$f" -o "${f%.dot}.svg"; done

# Switch executor mode
MPR_EXECUTOR=local    # Celery + MinIO
MPR_EXECUTOR=lambda   # Step Functions + Lambda + S3
MPR_EXECUTOR=gcp      # Cloud Run Jobs + GCS