chunker ui redo

This commit is contained in:
2026-03-15 16:03:53 -03:00
parent d5a3372d6b
commit b40bd68411
62 changed files with 5460 additions and 1493 deletions

View File

@@ -0,0 +1,290 @@
# Chunker Pipeline — Execution Path
## Overview
The chunker pipeline splits a media file into time-based segments using FFmpeg stream-copy. Events flow from worker threads through Redis and gRPC-Web streaming to the browser UI in real time.
**7 hops from worker thread to pixel:**
```
Worker thread → Pipeline._emit() → event_bridge() → Redis RPUSH
→ [50ms poll] gRPC server LRANGE → yield protobuf
→ HTTP/2 frame → Envoy (grpc-web filter)
→ HTTP/1.1 chunk → nginx (proxy_buffering off)
→ fetch ReadableStream → protobuf-ts decode
→ setEvents([...prev, evt]) → React re-render
```
---
## Step 1: Job Creation (Browser → GraphQL → Celery)
```
User clicks "Start"
→ App.tsx: handleStart(config)
→ api.ts: createChunkJob(config)
→ POST /graphql (nginx :80 → fastapi:8702)
→ graphql.py: Mutation.create_chunk_job()
→ core.db: creates ChunkJob row in Postgres
→ Celery: run_job.delay(job_type="chunk", job_id=..., payload=...)
→ Returns { id, celery_task_id } to browser
→ App.tsx: setJobId(id) — triggers gRPC stream subscription
```
**Files:** `ui/chunker/src/api.ts`, `core/api/graphql.py`, `core/jobs/task.py`
---
## Step 2: gRPC-Web Stream (Browser → nginx → Envoy → gRPC Server)
Once `jobId` is set, `useGrpcStream(jobId)` opens a server-streaming RPC:
```
useGrpcStream(jobId) fires useEffect
→ GrpcWebFetchTransport({ baseUrl: "/grpc-web" })
→ WorkerServiceClient.streamChunkPipeline({ jobId })
→ fetch() POST to /grpc-web/worker.WorkerService/StreamChunkPipeline
→ nginx :80 /grpc-web/ (proxy_pass → envoy:8090, proxy_buffering off)
→ Envoy :8090 (grpc_web filter: HTTP/1.1 grpc-web → HTTP/2 native gRPC)
→ gRPC server :50051 WorkerServicer.StreamChunkPipeline()
→ Enters Redis polling loop (Step 5)
```
**Files:** `ui/chunker/src/hooks/useGrpcStream.ts`, `ctrl/nginx.conf`, `ctrl/envoy.yaml`, `core/rpc/server.py`
**Key nginx config:** `proxy_buffering off` is critical — without it, nginx collects the entire upstream response before forwarding, defeating streaming entirely.
---
## Step 3: Celery Worker → ChunkHandler
```
Celery picks up run_job task
→ task.py: run_job(job_type="chunk", job_id, payload)
→ registry.get_handler("chunk") → ChunkHandler
→ chunk.py: ChunkHandler.process(job_id, payload)
→ download_to_temp(BUCKET_IN, source_key) — pulls source from MinIO/S3
→ Creates output_dir: /app/media/out/chunks/{job_id}/
→ Constructs event_bridge callback (bridges Pipeline events → Redis)
→ pipeline = Pipeline(source, ..., event_callback=event_bridge, output_dir=...)
→ pipeline.run()
```
**Files:** `core/jobs/task.py`, `core/jobs/handlers/chunk.py`
The `event_bridge` closure wraps every `Pipeline._emit()` call, forwarding to `push_event(job_id, event_type, data)` which writes to Redis.
---
## Step 4: Pipeline Orchestration (inside Celery worker process)
`Pipeline.run()` spawns multiple threads:
```
pipeline.run():
├─ Chunker(source, chunk_duration)
│ → ffprobe source file → gets duration, file_size
│ → calculates total_chunks = ceil(duration / chunk_duration)
├─ _emit("pipeline_start", {...}) → event_bridge → Redis
├─ _emit("pipeline_info", {file_size, duration, total_chunks}) → Redis
├─ Creates ChunkQueue(maxsize=10)
├─ Creates WorkerPool(num_workers=N, chunk_queue, processor, event_callback)
├─ pool.start() — spawns N worker threads
├─ MONITOR THREAD starts (_monitor_progress)
│ → Every 500ms: _emit("pipeline_progress", {elapsed, throughput_mbps}) → Redis
├─ PRODUCER THREAD starts (_produce_chunks)
│ → Iterates chunker.chunks() → yields Chunk(sequence, start_time, end_time)
│ → For each: chunk_queue.put(chunk)
│ → _emit("chunk_queued", {sequence, start_time, end_time, queue_size}) → Redis
│ → chunk_queue.close() when done (sends N sentinel Nones)
├─ WORKER THREADS (N concurrent, each runs worker.py:Worker.run())
│ │ Each worker loops:
│ │
│ ├─ chunk = chunk_queue.get(timeout=1.0)
│ ├─ _emit("chunk_processing", {sequence, state:"processing", queue_size}) → Redis
│ │
│ ├─ processor.process(chunk)
│ │ ├─ ffmpeg: runs `ffmpeg -ss start -to end -c copy chunk_NNNN.mp4`
│ │ ├─ simulated_decode: sleep(random) + checksum
│ │ └─ checksum: reads bytes, computes hash
│ │
│ ├─ On success: _emit("chunk_done", {sequence, processing_time, retries, queue_size}) → Redis
│ ├─ On failure: retries with exponential backoff (0.1s, 0.2s, 0.4s...)
│ │ └─ _emit("chunk_retry", {sequence, attempt, backoff}) → Redis
│ │ └─ _emit("chunk_error", {sequence, error, retries}) → Redis (after exhaustion)
│ │
│ └─ On sentinel (None): _emit("worker_status", {state:"stopped"}) → Redis
├─ pool.wait() — joins all worker threads, collects results
├─ monitor_stop.set() — stops progress monitor
├─ ResultCollector — reassembles results in sequence order
│ └─ _emit("chunk_collected", {sequence, buffered, emitted}) → Redis
├─ Writes manifest.json to output_dir
└─ _emit("pipeline_complete", {total_chunks, processed, failed, elapsed, throughput}) → Redis
```
**Files:** `core/chunker/pipeline.py`, `core/chunker/worker.py`, `core/chunker/pool.py`, `core/chunker/chunker.py`, `core/chunker/collector.py`
---
## Step 5: Redis — the Event Bus
```
WRITE side (Celery worker, all threads):
push_event(job_id, event_type, data)
→ json.dumps({"event": event_type, ...data})
→ Redis RPUSH to key "chunk_events:{job_id}"
→ Redis EXPIRE 3600 (1 hour TTL)
READ side (gRPC server, StreamChunkPipeline):
poll_events(job_id, cursor)
→ Redis LRANGE "chunk_events:{job_id}" cursor -1
→ Returns (parsed_events, new_cursor)
→ Called every 50ms (time.sleep(0.05) in server loop)
```
Redis acts as a decoupling layer between the Celery worker process (which runs the pipeline) and the gRPC server process (which streams to browsers). Events are appended with RPUSH and read with cursor-based LRANGE polling.
**Files:** `core/events.py`
---
## Step 6: gRPC Server → Envoy → nginx → Browser
```
server.py: StreamChunkPipeline polling loop:
while context.is_active():
events, cursor = poll_events(job_id, cursor) ← Redis LRANGE
for data in events:
yield worker_pb2.ChunkPipelineEvent( ← serialized protobuf message
job_id, event_type, sequence, worker_id,
state, queue_size, elapsed, throughput_mbps,
total_chunks, processed_chunks, failed_chunks,
error, processing_time, retries
)
if event_type in ("pipeline_complete", "pipeline_error"):
return ← ends the stream
time.sleep(0.05) ← 50ms poll interval
Each yield sends:
→ gRPC HTTP/2 DATA frame to Envoy
→ Envoy grpc_web filter: HTTP/2 → base64-encoded grpc-web-text
→ nginx proxy_pass (proxy_buffering off) → chunked HTTP/1.1 to browser
→ fetch() ReadableStream in GrpcWebFetchTransport
→ @protobuf-ts decodes protobuf → ChunkPipelineEvent TypeScript object
```
**Files:** `core/rpc/server.py`, `ctrl/envoy.yaml`, `ctrl/nginx.conf`, `ui/common/api/grpc/worker.ts`, `ui/common/api/grpc/worker.client.ts`
---
## Step 7: React State Derivation and Rendering
```
useGrpcStream.ts:
for await (const msg of stream.responses):
const evt = toEvent(msg) ← maps protobuf camelCase → snake_case PipelineEvent
setEvents(prev => [...prev, evt]) ← appends to events array
if pipeline_complete/error → setDone(true), break
App.tsx useMemo(events):
Iterates ALL events on every update, derives:
├─ chunkMap: Map<sequence, ChunkInfo> — state machine per chunk
│ pending → queued → processing → done/error/retry
├─ workerMap: Map<worker_id, WorkerInfo> — state per worker
│ idle → processing → idle → ... → stopped
├─ stats: PipelineStats
│ total_chunks, processed, failed, retries, elapsed, throughput_mbps, queue_size
├─ errors: ErrorEntry[] — every event containing an error field
└─ queueSize: number — last seen queue_size value
Renders:
├─ ChunkGrid — colored cells per chunk (pending/queued/processing/done/error)
├─ QueueGauge — current queue depth / max
├─ WorkerPanel — per-worker state + current chunk assignment
├─ StatsPanel — elapsed time, throughput, processed/failed counts
├─ ErrorLog — scrollable error list
└─ OutputFiles — download links (when done)
```
**Files:** `ui/chunker/src/hooks/useGrpcStream.ts`, `ui/chunker/src/App.tsx`
---
## Step 8: Output File Access (after pipeline completes)
```
App.tsx useEffect([done, jobId]):
→ api.ts: getChunkOutputFiles(jobId)
→ POST /graphql → graphql.py: chunk_output_files(job_id)
→ Reads /app/media/out/chunks/{job_id}/ directory listing from disk
→ Returns [{key, size, url: "/media/out/chunks/{job_id}/chunk_0001.mp4"}]
→ Browser renders download links
→ Click link → nginx /media/out/ → alias /app/media/out/ → serves file from disk
```
Chunks are written directly to `media/out/chunks/{job_id}/` by the ffmpeg processor — no MinIO upload needed for output. Nginx serves them with `autoindex on`.
**Files:** `core/api/graphql.py`, `core/jobs/handlers/chunk.py`, `ctrl/nginx.conf`
---
## Event Types Reference
| Event | Source | Key Fields |
|-------|--------|------------|
| `pipeline_start` | Pipeline.run() | source, chunk_duration, num_workers, processor_type |
| `pipeline_info` | Pipeline.run() | file_size, source_duration, total_chunks |
| `pipeline_progress` | Monitor thread (500ms) | elapsed, throughput_mbps |
| `chunk_queued` | Producer thread | sequence, start_time, end_time, duration, queue_size |
| `chunk_processing` | Worker thread | sequence, worker_id, state, queue_size |
| `chunk_done` | Worker thread | sequence, processing_time, retries, queue_size |
| `chunk_retry` | Worker thread | sequence, attempt, backoff |
| `chunk_error` | Worker thread | sequence, error, retries |
| `chunk_collected` | ResultCollector | sequence, buffered, emitted |
| `worker_status` | Worker thread | worker_id, state (idle/processing/stopped) |
| `pipeline_complete` | Pipeline.run() | total_chunks, processed, failed, elapsed, throughput_mbps |
| `pipeline_error` | Pipeline.run() | error |
---
## Thread Model (inside Celery worker)
```
Celery worker process
└─ run_job task thread
└─ Pipeline.run()
├─ Producer thread — enqueues chunks
├─ Monitor thread — emits progress every 500ms
├─ Worker thread 0 — pulls from queue, processes
├─ Worker thread 1 — pulls from queue, processes
├─ Worker thread 2 — pulls from queue, processes
└─ Worker thread 3 — pulls from queue, processes
```
All threads share the same `event_callback``event_bridge``push_event()`, which creates a new Redis connection per call. Thread-safe via Redis atomic RPUSH.
---
## Infrastructure
| Service | Port | Role |
|---------|------|------|
| nginx | 80 | Reverse proxy, static file serving |
| fastapi | 8702 | GraphQL API (Strawberry) |
| celery | — | Task worker (runs pipeline) |
| redis | 6379 | Event bus + Celery broker |
| grpc | 50051 | gRPC server (StreamChunkPipeline) |
| envoy | 8090 | gRPC-Web ↔ native gRPC translation |
| minio | 9000 | S3-compatible source media storage |
| postgres | 5432 | Job/asset metadata |

View File

@@ -1,212 +0,0 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>MPR - Architecture</title>
<link rel="stylesheet" href="styles.css" />
</head>
<body>
<h1>MPR - Media Processor</h1>
<p>
Media transcoding platform with dual execution modes: local (Celery
+ MinIO) and cloud (AWS Step Functions + Lambda + S3).
</p>
<nav>
<a href="#overview">System Overview</a>
<a href="#data-model">Data Model</a>
<a href="#job-flow">Job Flow</a>
<a href="#media-storage">Media Storage</a>
</nav>
<h2 id="overview">System Overview</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Local Architecture (Development)</h3>
<object type="image/svg+xml" data="01a-local-architecture.svg">
<img
src="01a-local-architecture.svg"
alt="Local Architecture"
/>
</object>
<a href="01a-local-architecture.svg" target="_blank"
>Open full size</a
>
</div>
<div class="diagram">
<h3>AWS Architecture (Production)</h3>
<object type="image/svg+xml" data="01b-aws-architecture.svg">
<img
src="01b-aws-architecture.svg"
alt="AWS Architecture"
/>
</object>
<a href="01b-aws-architecture.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Components</h3>
<ul>
<li>
<span class="color-box" style="background: #e8f4f8"></span>
Reverse Proxy (nginx)
</li>
<li>
<span class="color-box" style="background: #f0f8e8"></span>
Application Layer (Django Admin, GraphQL API, Timeline UI)
</li>
<li>
<span class="color-box" style="background: #fff8e8"></span>
Worker Layer (Celery local mode)
</li>
<li>
<span class="color-box" style="background: #fde8d0"></span>
AWS (Step Functions, Lambda - cloud mode)
</li>
<li>
<span class="color-box" style="background: #f8e8f0"></span>
Data Layer (PostgreSQL, Redis)
</li>
<li>
<span class="color-box" style="background: #f0f0f0"></span>
S3 Storage (MinIO local / AWS S3 cloud)
</li>
</ul>
</div>
<h2 id="data-model">Data Model</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Entity Relationships</h3>
<object type="image/svg+xml" data="02-data-model.svg">
<img src="02-data-model.svg" alt="Data Model" />
</object>
<a href="02-data-model.svg" target="_blank">Open full size</a>
</div>
</div>
<div class="legend">
<h3>Entities</h3>
<ul>
<li>
<span class="color-box" style="background: #4a90d9"></span>
MediaAsset - Video/audio files (S3 keys as paths)
</li>
<li>
<span class="color-box" style="background: #50b050"></span>
TranscodePreset - Encoding configurations
</li>
<li>
<span class="color-box" style="background: #d9534f"></span>
TranscodeJob - Processing queue (celery_task_id or
execution_arn)
</li>
</ul>
</div>
<h2 id="job-flow">Job Flow</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Job Lifecycle</h3>
<object type="image/svg+xml" data="03-job-flow.svg">
<img src="03-job-flow.svg" alt="Job Flow" />
</object>
<a href="03-job-flow.svg" target="_blank">Open full size</a>
</div>
</div>
<div class="legend">
<h3>Job States</h3>
<ul>
<li>
<span class="color-box" style="background: #ffc107"></span>
PENDING - Waiting in queue
</li>
<li>
<span class="color-box" style="background: #17a2b8"></span>
PROCESSING - Worker executing
</li>
<li>
<span class="color-box" style="background: #28a745"></span>
COMPLETED - Success
</li>
<li>
<span class="color-box" style="background: #dc3545"></span>
FAILED - Error occurred
</li>
<li>
<span class="color-box" style="background: #6c757d"></span>
CANCELLED - User cancelled
</li>
</ul>
<h3>Execution Modes</h3>
<ul>
<li>
<span class="color-box" style="background: #e8f4e8"></span>
Local: Celery + MinIO (S3 API) + FFmpeg
</li>
<li>
<span class="color-box" style="background: #fde8d0"></span>
Lambda: Step Functions + Lambda + AWS S3
</li>
</ul>
</div>
<h2 id="media-storage">Media Storage</h2>
<div class="diagram-container">
<p>
MPR separates media into input and output paths for flexible
storage configuration.
</p>
<p>
<a href="04-media-storage.md" target="_blank"
>View Media Storage Documentation →</a
>
</p>
</div>
<h2>API (GraphQL)</h2>
<pre><code># GraphiQL IDE
http://mpr.local.ar/graphql
# Queries
query { assets(status: "ready") { id filename duration } }
query { jobs(status: "processing") { id status progress } }
query { presets { id name container videoCodec } }
query { systemStatus { status version } }
# Mutations
mutation { scanMediaFolder { found registered skipped } }
mutation { createJob(input: { sourceAssetId: "...", presetId: "..." }) { id status } }
mutation { cancelJob(id: "...") { id status } }
mutation { retryJob(id: "...") { id status } }
mutation { updateAsset(id: "...", input: { comments: "..." }) { id comments } }
mutation { deleteAsset(id: "...") { ok } }
# Lambda callback (REST)
POST /api/jobs/{id}/callback - Lambda completion webhook</code></pre>
<h2>Access Points</h2>
<pre><code># Local development
127.0.0.1 mpr.local.ar
http://mpr.local.ar/admin - Django Admin
http://mpr.local.ar/graphql - GraphiQL
http://mpr.local.ar/ - Timeline UI
http://localhost:9001 - MinIO Console
# AWS deployment
https://mpr.mcrn.ar/ - Production</code></pre>
<h2>Quick Reference</h2>
<pre><code># Render SVGs from DOT files
for f in *.dot; do dot -Tsvg "$f" -o "${f%.dot}.svg"; done
# Switch executor mode
MPR_EXECUTOR=local # Celery + MinIO
MPR_EXECUTOR=lambda # Step Functions + Lambda + S3</code></pre>
</body>
</html>

View File

@@ -3,6 +3,8 @@
--text-color: #e8e8e8;
--accent-color: #4a90d9;
--border-color: #333;
--sidebar-width: 220px;
--sidebar-bg: #151528;
}
* {
@@ -16,6 +18,59 @@ body {
background-color: var(--bg-color);
color: var(--text-color);
line-height: 1.6;
}
/* Sidebar navigation */
.sidebar {
position: fixed;
top: 0;
left: 0;
width: var(--sidebar-width);
height: 100vh;
background: var(--sidebar-bg);
border-right: 1px solid var(--border-color);
padding: 1.5rem 1rem;
overflow-y: auto;
z-index: 10;
}
.sidebar h2 {
font-size: 1.2rem;
color: var(--accent-color);
margin-bottom: 1.5rem;
padding-bottom: 0.5rem;
border-bottom: 1px solid var(--border-color);
}
.sidebar ul {
list-style: none;
display: flex;
flex-direction: column;
gap: 0.25rem;
}
.sidebar li {
display: block;
}
.sidebar a {
display: block;
padding: 0.4rem 0.6rem;
color: var(--text-color);
text-decoration: none;
font-size: 0.85rem;
border-radius: 4px;
transition: background 0.15s, color 0.15s;
}
.sidebar a:hover {
background: rgba(74, 144, 217, 0.15);
color: var(--accent-color);
}
/* Main content */
.content {
margin-left: var(--sidebar-width);
padding: 2rem;
}
@@ -25,12 +80,13 @@ h1 {
color: var(--accent-color);
}
h2 {
.content > h2 {
font-size: 1.5rem;
margin: 2rem 0 1rem;
color: var(--text-color);
border-bottom: 1px solid var(--border-color);
padding-bottom: 0.5rem;
scroll-margin-top: 1rem;
}
.diagram-container {
@@ -76,20 +132,6 @@ h2 {
text-decoration: underline;
}
nav {
margin-bottom: 2rem;
}
nav a {
color: var(--accent-color);
text-decoration: none;
margin-right: 1.5rem;
}
nav a:hover {
text-decoration: underline;
}
.legend {
margin-top: 2rem;
padding: 1rem;
@@ -141,3 +183,27 @@ pre code {
background: none;
padding: 0;
}
/* Responsive: collapse sidebar on small screens */
@media (max-width: 768px) {
.sidebar {
position: static;
width: 100%;
height: auto;
border-right: none;
border-bottom: 1px solid var(--border-color);
}
.sidebar ul {
flex-direction: row;
flex-wrap: wrap;
}
.content {
margin-left: 0;
}
.diagram {
min-width: 100%;
}
}

View File

@@ -7,219 +7,241 @@
<link rel="stylesheet" href="architecture/styles.css" />
</head>
<body>
<h1>MPR - Media Processor</h1>
<p>
Media transcoding platform with three execution modes: local (Celery
+ MinIO), AWS (Step Functions + Lambda + S3), and GCP (Cloud Run
Jobs + GCS). Storage is S3-compatible across all environments.
</p>
<nav>
<a href="#overview">System Overview</a>
<a href="#data-model">Data Model</a>
<a href="#job-flow">Job Flow</a>
<a href="#media-storage">Media Storage</a>
<nav class="sidebar">
<h2>MPR</h2>
<ul>
<li><a href="#overview">System Overview</a></li>
<li><a href="#data-model">Data Model</a></li>
<li><a href="#job-flow">Job Flow</a></li>
<li><a href="#media-storage">Media Storage</a></li>
<li><a href="#chunker-pipeline">Chunker Pipeline</a></li>
<li><a href="#api">API (GraphQL)</a></li>
<li><a href="#access-points">Access Points</a></li>
<li><a href="#quick-reference">Quick Reference</a></li>
</ul>
</nav>
<h2 id="overview">System Overview</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Local Architecture (Development)</h3>
<object
type="image/svg+xml"
data="architecture/01a-local-architecture.svg"
>
<img
src="architecture/01a-local-architecture.svg"
alt="Local Architecture"
/>
</object>
<a
href="architecture/01a-local-architecture.svg"
target="_blank"
>Open full size</a
>
</div>
<div class="diagram">
<h3>AWS Architecture (Production)</h3>
<object
type="image/svg+xml"
data="architecture/01b-aws-architecture.svg"
>
<img
src="architecture/01b-aws-architecture.svg"
alt="AWS Architecture"
/>
</object>
<a href="architecture/01b-aws-architecture.svg" target="_blank"
>Open full size</a
>
</div>
<div class="diagram">
<h3>GCP Architecture (Production)</h3>
<object
type="image/svg+xml"
data="architecture/01c-gcp-architecture.svg"
>
<img
src="architecture/01c-gcp-architecture.svg"
alt="GCP Architecture"
/>
</object>
<a href="architecture/01c-gcp-architecture.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Components</h3>
<ul>
<li>
<span class="color-box" style="background: #e8f4f8"></span>
Reverse Proxy (nginx)
</li>
<li>
<span class="color-box" style="background: #f0f8e8"></span>
Application Layer (Django Admin, GraphQL API, Timeline UI)
</li>
<li>
<span class="color-box" style="background: #fff8e8"></span>
Worker Layer (Celery local mode)
</li>
<li>
<span class="color-box" style="background: #fde8d0"></span>
AWS (Step Functions, Lambda)
</li>
<li>
<span class="color-box" style="background: #e8f0fd"></span>
GCP (Cloud Run Jobs + GCS)
</li>
<li>
<span class="color-box" style="background: #f8e8f0"></span>
Data Layer (PostgreSQL, Redis)
</li>
<li>
<span class="color-box" style="background: #f0f0f0"></span>
S3-compatible Storage (MinIO / AWS S3 / GCS)
</li>
</ul>
</div>
<h2 id="data-model">Data Model</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Entity Relationships</h3>
<object
type="image/svg+xml"
data="architecture/02-data-model.svg"
>
<img
src="architecture/02-data-model.svg"
alt="Data Model"
/>
</object>
<a href="architecture/02-data-model.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Entities</h3>
<ul>
<li>
<span class="color-box" style="background: #4a90d9"></span>
MediaAsset - Video/audio files with metadata
</li>
<li>
<span class="color-box" style="background: #50b050"></span>
TranscodePreset - Encoding configurations
</li>
<li>
<span class="color-box" style="background: #d9534f"></span>
TranscodeJob - Processing queue items
</li>
</ul>
</div>
<h2 id="job-flow">Job Flow</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Job Lifecycle</h3>
<object
type="image/svg+xml"
data="architecture/03-job-flow.svg"
>
<img src="architecture/03-job-flow.svg" alt="Job Flow" />
</object>
<a href="architecture/03-job-flow.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Job States</h3>
<ul>
<li>
<span class="color-box" style="background: #ffc107"></span>
PENDING - Waiting in queue
</li>
<li>
<span class="color-box" style="background: #17a2b8"></span>
PROCESSING - Worker executing
</li>
<li>
<span class="color-box" style="background: #28a745"></span>
COMPLETED - Success
</li>
<li>
<span class="color-box" style="background: #dc3545"></span>
FAILED - Error occurred
</li>
<li>
<span class="color-box" style="background: #6c757d"></span>
CANCELLED - User cancelled
</li>
</ul>
</div>
<h2 id="media-storage">Media Storage</h2>
<div class="diagram-container">
<main class="content">
<h1>MPR - Media Processor</h1>
<p>
MPR separates media into <strong>input</strong> and
<strong>output</strong> paths, each independently configurable.
File paths are stored
<strong>relative to their respective root</strong> to ensure
portability between local development and cloud deployments (AWS
S3, etc.).
Media transcoding platform with three execution modes: local (Celery
+ MinIO), AWS (Step Functions + Lambda + S3), and GCP (Cloud Run
Jobs + GCS). Storage is S3-compatible across all environments.
</p>
</div>
<div class="legend">
<h3>Input / Output Separation</h3>
<ul>
<li>
<span class="color-box" style="background: #4a90d9"></span>
<code>MEDIA_IN</code> - Source media files to process
</li>
<li>
<span class="color-box" style="background: #50b050"></span>
<code>MEDIA_OUT</code> - Transcoded/trimmed output files
</li>
</ul>
<p><strong>Why Relative Paths?</strong></p>
<ul>
<li>Portability: Same database works locally and in cloud</li>
<li>Flexibility: Easy to switch between storage backends</li>
<li>Simplicity: No need to update paths when migrating</li>
</ul>
</div>
<h2 id="overview">System Overview</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Local Architecture (Development)</h3>
<object
type="image/svg+xml"
data="architecture/01a-local-architecture.svg"
>
<img
src="architecture/01a-local-architecture.svg"
alt="Local Architecture"
/>
</object>
<a
href="architecture/01a-local-architecture.svg"
target="_blank"
>Open full size</a
>
</div>
<div class="diagram">
<h3>AWS Architecture (Production)</h3>
<object
type="image/svg+xml"
data="architecture/01b-aws-architecture.svg"
>
<img
src="architecture/01b-aws-architecture.svg"
alt="AWS Architecture"
/>
</object>
<a href="architecture/01b-aws-architecture.svg" target="_blank"
>Open full size</a
>
</div>
<div class="diagram">
<h3>GCP Architecture (Production)</h3>
<object
type="image/svg+xml"
data="architecture/01c-gcp-architecture.svg"
>
<img
src="architecture/01c-gcp-architecture.svg"
alt="GCP Architecture"
/>
</object>
<a href="architecture/01c-gcp-architecture.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Local Development</h3>
<pre><code>MEDIA_IN=/app/media/in
<div class="legend">
<h3>Components</h3>
<ul>
<li>
<span class="color-box" style="background: #e8f4f8"></span>
Reverse Proxy (nginx)
</li>
<li>
<span class="color-box" style="background: #f0f8e8"></span>
Application Layer (Django Admin, GraphQL API, Timeline UI)
</li>
<li>
<span class="color-box" style="background: #fff8e8"></span>
Worker Layer (Celery local mode)
</li>
<li>
<span class="color-box" style="background: #fde8d0"></span>
AWS (Step Functions, Lambda)
</li>
<li>
<span class="color-box" style="background: #e8f0fd"></span>
GCP (Cloud Run Jobs + GCS)
</li>
<li>
<span class="color-box" style="background: #f8e8f0"></span>
Data Layer (PostgreSQL, Redis)
</li>
<li>
<span class="color-box" style="background: #f0f0f0"></span>
S3-compatible Storage (MinIO / AWS S3 / GCS)
</li>
</ul>
</div>
<h2 id="data-model">Data Model</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Entity Relationships</h3>
<object
type="image/svg+xml"
data="architecture/02-data-model.svg"
>
<img
src="architecture/02-data-model.svg"
alt="Data Model"
/>
</object>
<a href="architecture/02-data-model.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Entities</h3>
<ul>
<li>
<span class="color-box" style="background: #4a90d9"></span>
MediaAsset - Video/audio files with metadata
</li>
<li>
<span class="color-box" style="background: #50b050"></span>
TranscodePreset - Encoding configurations
</li>
<li>
<span class="color-box" style="background: #d9534f"></span>
TranscodeJob - Processing queue items
</li>
</ul>
</div>
<h2 id="job-flow">Job Flow</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Job Lifecycle</h3>
<object
type="image/svg+xml"
data="architecture/03-job-flow.svg"
>
<img src="architecture/03-job-flow.svg" alt="Job Flow" />
</object>
<a href="architecture/03-job-flow.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Job States</h3>
<ul>
<li>
<span class="color-box" style="background: #ffc107"></span>
PENDING - Waiting in queue
</li>
<li>
<span class="color-box" style="background: #17a2b8"></span>
PROCESSING - Worker executing
</li>
<li>
<span class="color-box" style="background: #28a745"></span>
COMPLETED - Success
</li>
<li>
<span class="color-box" style="background: #dc3545"></span>
FAILED - Error occurred
</li>
<li>
<span class="color-box" style="background: #6c757d"></span>
CANCELLED - User cancelled
</li>
</ul>
<h3>Execution Modes</h3>
<ul>
<li>
<span class="color-box" style="background: #e8f4e8"></span>
Local: Celery + MinIO (S3 API) + FFmpeg
</li>
<li>
<span class="color-box" style="background: #fde8d0"></span>
Lambda: Step Functions + Lambda + AWS S3
</li>
<li>
<span class="color-box" style="background: #e8f0fd"></span>
GCP: Cloud Run Jobs + GCS (S3 compat)
</li>
</ul>
</div>
<h2 id="media-storage">Media Storage</h2>
<div class="diagram-container">
<p>
MPR separates media into <strong>input</strong> and
<strong>output</strong> paths, each independently configurable.
File paths are stored
<strong>relative to their respective root</strong> to ensure
portability between local development and cloud deployments.
</p>
</div>
<div class="legend">
<h3>Input / Output Separation</h3>
<ul>
<li>
<span class="color-box" style="background: #4a90d9"></span>
<code>MEDIA_IN</code> - Source media files to process
</li>
<li>
<span class="color-box" style="background: #50b050"></span>
<code>MEDIA_OUT</code> - Transcoded/trimmed output files
</li>
</ul>
<p><strong>Why Relative Paths?</strong></p>
<ul>
<li>Portability: Same database works locally and in cloud</li>
<li>Flexibility: Easy to switch between storage backends</li>
<li>Simplicity: No need to update paths when migrating</li>
</ul>
</div>
<div class="legend">
<h3>Local Development</h3>
<pre><code>MEDIA_IN=/app/media/in
MEDIA_OUT=/app/media/out
/app/media/
@@ -228,52 +250,131 @@ MEDIA_OUT=/app/media/out
│ └── subfolder/video3.mp4
└── out/ # Transcoded output
└── video1_h264.mp4</code></pre>
</div>
</div>
<div class="legend">
<h3>AWS/Cloud Deployment</h3>
<pre><code>MEDIA_IN=s3://source-bucket/media/
<div class="legend">
<h3>AWS/Cloud Deployment</h3>
<pre><code>MEDIA_IN=s3://source-bucket/media/
MEDIA_OUT=s3://output-bucket/transcoded/
MEDIA_BASE_URL=https://source-bucket.s3.amazonaws.com/media/</code></pre>
<p>
Database paths remain unchanged (already relative). Just upload
files to S3 and update environment variables.
</p>
</div>
<p>
Database paths remain unchanged (already relative). Just upload
files to S3 and update environment variables.
</p>
</div>
<div class="legend">
<h3>API (GraphQL)</h3>
<p>
All client interactions go through GraphQL at
<code>/graphql</code>.
<a href="architecture/04-media-storage.md" target="_blank"
>Full Media Storage Documentation &rarr;</a
>
</p>
<ul>
<li>
<code>scanMediaFolder</code> - Scan S3 bucket for media
files
</li>
<li><code>createJob</code> - Create transcode/trim job</li>
<li>
<code>cancelJob / retryJob</code> - Job lifecycle management
</li>
<li>
<code>updateAsset / deleteAsset</code> - Asset management
</li>
</ul>
<p><strong>Supported File Types:</strong></p>
<p>
Video: mp4, mkv, avi, mov, webm, flv, wmv, m4v<br />
Audio: mp3, wav, flac, aac, ogg, m4a
</p>
</div>
<h2>Access Points</h2>
<pre><code># Add to /etc/hosts
<h2 id="chunker-pipeline">Chunker Pipeline</h2>
<div class="diagram-container">
<p>
The chunker pipeline splits media into time-based segments,
streaming real-time events from worker threads through Redis
and gRPC-Web to the browser UI. 7 hops from worker thread to pixel.
</p>
</div>
<div class="legend">
<h3>Event Path</h3>
<pre><code>Worker thread → Pipeline._emit() → event_bridge() → Redis RPUSH
→ [50ms poll] gRPC server LRANGE → yield protobuf
→ HTTP/2 frame → Envoy (grpc-web filter)
→ HTTP/1.1 chunk → nginx (proxy_buffering off)
→ fetch ReadableStream → protobuf-ts decode
→ setEvents([...prev, evt]) → React re-render</code></pre>
</div>
<div class="legend">
<h3>Thread Model (inside Celery worker)</h3>
<pre><code>Celery worker process
└─ run_job task thread
└─ Pipeline.run()
├─ Producer thread — enqueues chunks
├─ Monitor thread — emits progress every 500ms
├─ Worker thread 0 — pulls from queue, processes
├─ Worker thread 1 — pulls from queue, processes
├─ Worker thread 2 — pulls from queue, processes
└─ Worker thread 3 — pulls from queue, processes</code></pre>
</div>
<div class="legend">
<h3>Infrastructure</h3>
<ul>
<li><code>nginx :80</code> - Reverse proxy, static file serving</li>
<li><code>fastapi :8702</code> - GraphQL API (Strawberry)</li>
<li><code>celery</code> - Task worker (runs pipeline)</li>
<li><code>redis :6379</code> - Event bus + Celery broker</li>
<li><code>grpc :50051</code> - gRPC server (StreamChunkPipeline)</li>
<li><code>envoy :8090</code> - gRPC-Web &harr; native gRPC translation</li>
<li><code>minio :9000</code> - S3-compatible source media storage</li>
<li><code>postgres :5432</code> - Job/asset metadata</li>
</ul>
</div>
<p>
<a href="architecture/05-chunker-pipeline.md" target="_blank"
>Full Chunker Pipeline Documentation &rarr;</a
>
</p>
<h2 id="api">API (GraphQL)</h2>
<div class="legend">
<p>
All client interactions go through GraphQL at
<code>/graphql</code>.
</p>
<pre><code># GraphiQL IDE
http://mpr.local.ar/graphql
# Queries
query { assets(status: "ready") { id filename duration } }
query { jobs(status: "processing") { id status progress } }
query { presets { id name container videoCodec } }
query { systemStatus { status version } }
# Mutations
mutation { scanMediaFolder { found registered skipped } }
mutation { createJob(input: { sourceAssetId: "...", presetId: "..." }) { id status } }
mutation { cancelJob(id: "...") { id status } }
mutation { retryJob(id: "...") { id status } }
mutation { updateAsset(id: "...", input: { comments: "..." }) { id comments } }
mutation { deleteAsset(id: "...") { ok } }
# Lambda callback (REST)
POST /api/jobs/{id}/callback - Lambda completion webhook</code></pre>
<p><strong>Supported File Types:</strong></p>
<p>
Video: mp4, mkv, avi, mov, webm, flv, wmv, m4v<br />
Audio: mp3, wav, flac, aac, ogg, m4a
</p>
</div>
<h2 id="access-points">Access Points</h2>
<pre><code># Add to /etc/hosts
127.0.0.1 mpr.local.ar
# URLs
http://mpr.local.ar/admin - Django Admin
http://mpr.local.ar/graphql - GraphiQL IDE
http://mpr.local.ar/ - Timeline UI</code></pre>
http://mpr.local.ar/admin - Django Admin
http://mpr.local.ar/graphql - GraphiQL IDE
http://mpr.local.ar/ - Timeline UI
http://mpr.local.ar/chunker/ - Chunker UI
http://localhost:9001 - MinIO Console
# AWS deployment
https://mpr.mcrn.ar/ - Production</code></pre>
<h2 id="quick-reference">Quick Reference</h2>
<pre><code># Render SVGs from DOT files
for f in docs/architecture/*.dot; do dot -Tsvg "$f" -o "${f%.dot}.svg"; done
# Switch executor mode
MPR_EXECUTOR=local # Celery + MinIO
MPR_EXECUTOR=lambda # Step Functions + Lambda + S3
MPR_EXECUTOR=gcp # Cloud Run Jobs + GCS</code></pre>
</main>
</body>
</html>