chunker ui redo

2026-03-15 16:03:53 -03:00
parent d5a3372d6b
commit b40bd68411
62 changed files with 5460 additions and 1493 deletions
--- a/docs/architecture/05-chunker-pipeline.md
+++ b/docs/architecture/05-chunker-pipeline.md
@@ -0,0 +1,290 @@
+# Chunker Pipeline — Execution Path
+
+## Overview
+
+The chunker pipeline splits a media file into time-based segments using FFmpeg stream-copy. Events flow from worker threads through Redis and gRPC-Web streaming to the browser UI in real time.
+
+**7 hops from worker thread to pixel:**
+
+```
+Worker thread → Pipeline._emit() → event_bridge() → Redis RPUSH
+  → [50ms poll] gRPC server LRANGE → yield protobuf
+  → HTTP/2 frame → Envoy (grpc-web filter)
+  → HTTP/1.1 chunk → nginx (proxy_buffering off)
+  → fetch ReadableStream → protobuf-ts decode
+  → setEvents([...prev, evt]) → React re-render
+```
+
+---
+
+## Step 1: Job Creation (Browser → GraphQL → Celery)
+
+```
+User clicks "Start"
+  → App.tsx: handleStart(config)
+  → api.ts: createChunkJob(config)
+  → POST /graphql  (nginx :80 → fastapi:8702)
+  → graphql.py: Mutation.create_chunk_job()
+  → core.db: creates ChunkJob row in Postgres
+  → Celery: run_job.delay(job_type="chunk", job_id=..., payload=...)
+  → Returns { id, celery_task_id } to browser
+  → App.tsx: setJobId(id)  — triggers gRPC stream subscription
+```
+
+**Files:** `ui/chunker/src/api.ts`, `core/api/graphql.py`, `core/jobs/task.py`
+
+---
+
+## Step 2: gRPC-Web Stream (Browser → nginx → Envoy → gRPC Server)
+
+Once `jobId` is set, `useGrpcStream(jobId)` opens a server-streaming RPC:
+
+```
+useGrpcStream(jobId) fires useEffect
+  → GrpcWebFetchTransport({ baseUrl: "/grpc-web" })
+  → WorkerServiceClient.streamChunkPipeline({ jobId })
+  → fetch() POST to /grpc-web/worker.WorkerService/StreamChunkPipeline
+  → nginx :80 /grpc-web/  (proxy_pass → envoy:8090, proxy_buffering off)
+  → Envoy :8090  (grpc_web filter: HTTP/1.1 grpc-web → HTTP/2 native gRPC)
+  → gRPC server :50051  WorkerServicer.StreamChunkPipeline()
+  → Enters Redis polling loop (Step 5)
+```
+
+**Files:** `ui/chunker/src/hooks/useGrpcStream.ts`, `ctrl/nginx.conf`, `ctrl/envoy.yaml`, `core/rpc/server.py`
+
+**Key nginx config:** `proxy_buffering off` is critical — without it, nginx collects the entire upstream response before forwarding, defeating streaming entirely.
+
+---
+
+## Step 3: Celery Worker → ChunkHandler
+
+```
+Celery picks up run_job task
+  → task.py: run_job(job_type="chunk", job_id, payload)
+  → registry.get_handler("chunk") → ChunkHandler
+  → chunk.py: ChunkHandler.process(job_id, payload)
+  → download_to_temp(BUCKET_IN, source_key)  — pulls source from MinIO/S3
+  → Creates output_dir: /app/media/out/chunks/{job_id}/
+  → Constructs event_bridge callback (bridges Pipeline events → Redis)
+  → pipeline = Pipeline(source, ..., event_callback=event_bridge, output_dir=...)
+  → pipeline.run()
+```
+
+**Files:** `core/jobs/task.py`, `core/jobs/handlers/chunk.py`
+
+The `event_bridge` closure wraps every `Pipeline._emit()` call, forwarding to `push_event(job_id, event_type, data)` which writes to Redis.
+
+---
+
+## Step 4: Pipeline Orchestration (inside Celery worker process)
+
+`Pipeline.run()` spawns multiple threads:
+
+```
+pipeline.run():
+  │
+  ├─ Chunker(source, chunk_duration)
+  │    → ffprobe source file → gets duration, file_size
+  │    → calculates total_chunks = ceil(duration / chunk_duration)
+  │
+  ├─ _emit("pipeline_start", {...})  → event_bridge → Redis
+  ├─ _emit("pipeline_info", {file_size, duration, total_chunks})  → Redis
+  │
+  ├─ Creates ChunkQueue(maxsize=10)
+  ├─ Creates WorkerPool(num_workers=N, chunk_queue, processor, event_callback)
+  │
+  ├─ pool.start()  — spawns N worker threads
+  │
+  ├─ MONITOR THREAD starts (_monitor_progress)
+  │    → Every 500ms: _emit("pipeline_progress", {elapsed, throughput_mbps}) → Redis
+  │
+  ├─ PRODUCER THREAD starts (_produce_chunks)
+  │    → Iterates chunker.chunks() → yields Chunk(sequence, start_time, end_time)
+  │    → For each: chunk_queue.put(chunk)
+  │    → _emit("chunk_queued", {sequence, start_time, end_time, queue_size}) → Redis
+  │    → chunk_queue.close() when done (sends N sentinel Nones)
+  │
+  ├─ WORKER THREADS (N concurrent, each runs worker.py:Worker.run())
+  │    │  Each worker loops:
+  │    │
+  │    ├─ chunk = chunk_queue.get(timeout=1.0)
+  │    ├─ _emit("chunk_processing", {sequence, state:"processing", queue_size}) → Redis
+  │    │
+  │    ├─ processor.process(chunk)
+  │    │    ├─ ffmpeg: runs `ffmpeg -ss start -to end -c copy chunk_NNNN.mp4`
+  │    │    ├─ simulated_decode: sleep(random) + checksum
+  │    │    └─ checksum: reads bytes, computes hash
+  │    │
+  │    ├─ On success: _emit("chunk_done", {sequence, processing_time, retries, queue_size}) → Redis
+  │    ├─ On failure: retries with exponential backoff (0.1s, 0.2s, 0.4s...)
+  │    │    └─ _emit("chunk_retry", {sequence, attempt, backoff}) → Redis
+  │    │    └─ _emit("chunk_error", {sequence, error, retries}) → Redis (after exhaustion)
+  │    │
+  │    └─ On sentinel (None): _emit("worker_status", {state:"stopped"}) → Redis
+  │
+  ├─ pool.wait()  — joins all worker threads, collects results
+  ├─ monitor_stop.set()  — stops progress monitor
+  │
+  ├─ ResultCollector — reassembles results in sequence order
+  │    └─ _emit("chunk_collected", {sequence, buffered, emitted}) → Redis
+  │
+  ├─ Writes manifest.json to output_dir
+  │
+  └─ _emit("pipeline_complete", {total_chunks, processed, failed, elapsed, throughput}) → Redis
+```
+
+**Files:** `core/chunker/pipeline.py`, `core/chunker/worker.py`, `core/chunker/pool.py`, `core/chunker/chunker.py`, `core/chunker/collector.py`
+
+---
+
+## Step 5: Redis — the Event Bus
+
+```
+WRITE side (Celery worker, all threads):
+  push_event(job_id, event_type, data)
+    → json.dumps({"event": event_type, ...data})
+    → Redis RPUSH to key "chunk_events:{job_id}"
+    → Redis EXPIRE 3600 (1 hour TTL)
+
+READ side (gRPC server, StreamChunkPipeline):
+  poll_events(job_id, cursor)
+    → Redis LRANGE "chunk_events:{job_id}" cursor -1
+    → Returns (parsed_events, new_cursor)
+    → Called every 50ms (time.sleep(0.05) in server loop)
+```
+
+Redis acts as a decoupling layer between the Celery worker process (which runs the pipeline) and the gRPC server process (which streams to browsers). Events are appended with RPUSH and read with cursor-based LRANGE polling.
+
+**Files:** `core/events.py`
+
+---
+
+## Step 6: gRPC Server → Envoy → nginx → Browser
+
+```
+server.py: StreamChunkPipeline polling loop:
+  while context.is_active():
+    events, cursor = poll_events(job_id, cursor)    ← Redis LRANGE
+    for data in events:
+      yield worker_pb2.ChunkPipelineEvent(          ← serialized protobuf message
+        job_id, event_type, sequence, worker_id,
+        state, queue_size, elapsed, throughput_mbps,
+        total_chunks, processed_chunks, failed_chunks,
+        error, processing_time, retries
+      )
+      if event_type in ("pipeline_complete", "pipeline_error"):
+        return                                      ← ends the stream
+    time.sleep(0.05)                                ← 50ms poll interval
+
+  Each yield sends:
+    → gRPC HTTP/2 DATA frame to Envoy
+    → Envoy grpc_web filter: HTTP/2 → base64-encoded grpc-web-text
+    → nginx proxy_pass (proxy_buffering off) → chunked HTTP/1.1 to browser
+    → fetch() ReadableStream in GrpcWebFetchTransport
+    → @protobuf-ts decodes protobuf → ChunkPipelineEvent TypeScript object
+```
+
+**Files:** `core/rpc/server.py`, `ctrl/envoy.yaml`, `ctrl/nginx.conf`, `ui/common/api/grpc/worker.ts`, `ui/common/api/grpc/worker.client.ts`
+
+---
+
+## Step 7: React State Derivation and Rendering
+
+```
+useGrpcStream.ts:
+  for await (const msg of stream.responses):
+    const evt = toEvent(msg)        ← maps protobuf camelCase → snake_case PipelineEvent
+    setEvents(prev => [...prev, evt])    ← appends to events array
+    if pipeline_complete/error → setDone(true), break
+
+App.tsx useMemo(events):
+  Iterates ALL events on every update, derives:
+    ├─ chunkMap: Map<sequence, ChunkInfo>  — state machine per chunk
+    │    pending → queued → processing → done/error/retry
+    ├─ workerMap: Map<worker_id, WorkerInfo>  — state per worker
+    │    idle → processing → idle → ... → stopped
+    ├─ stats: PipelineStats
+    │    total_chunks, processed, failed, retries, elapsed, throughput_mbps, queue_size
+    ├─ errors: ErrorEntry[]  — every event containing an error field
+    └─ queueSize: number  — last seen queue_size value
+
+  Renders:
+    ├─ ChunkGrid     — colored cells per chunk (pending/queued/processing/done/error)
+    ├─ QueueGauge    — current queue depth / max
+    ├─ WorkerPanel   — per-worker state + current chunk assignment
+    ├─ StatsPanel    — elapsed time, throughput, processed/failed counts
+    ├─ ErrorLog      — scrollable error list
+    └─ OutputFiles   — download links (when done)
+```
+
+**Files:** `ui/chunker/src/hooks/useGrpcStream.ts`, `ui/chunker/src/App.tsx`
+
+---
+
+## Step 8: Output File Access (after pipeline completes)
+
+```
+App.tsx useEffect([done, jobId]):
+  → api.ts: getChunkOutputFiles(jobId)
+  → POST /graphql → graphql.py: chunk_output_files(job_id)
+  → Reads /app/media/out/chunks/{job_id}/ directory listing from disk
+  → Returns [{key, size, url: "/media/out/chunks/{job_id}/chunk_0001.mp4"}]
+  → Browser renders download links
+  → Click link → nginx /media/out/ → alias /app/media/out/ → serves file from disk
+```
+
+Chunks are written directly to `media/out/chunks/{job_id}/` by the ffmpeg processor — no MinIO upload needed for output. Nginx serves them with `autoindex on`.
+
+**Files:** `core/api/graphql.py`, `core/jobs/handlers/chunk.py`, `ctrl/nginx.conf`
+
+---
+
+## Event Types Reference
+
+| Event | Source | Key Fields |
+|-------|--------|------------|
+| `pipeline_start` | Pipeline.run() | source, chunk_duration, num_workers, processor_type |
+| `pipeline_info` | Pipeline.run() | file_size, source_duration, total_chunks |
+| `pipeline_progress` | Monitor thread (500ms) | elapsed, throughput_mbps |
+| `chunk_queued` | Producer thread | sequence, start_time, end_time, duration, queue_size |
+| `chunk_processing` | Worker thread | sequence, worker_id, state, queue_size |
+| `chunk_done` | Worker thread | sequence, processing_time, retries, queue_size |
+| `chunk_retry` | Worker thread | sequence, attempt, backoff |
+| `chunk_error` | Worker thread | sequence, error, retries |
+| `chunk_collected` | ResultCollector | sequence, buffered, emitted |
+| `worker_status` | Worker thread | worker_id, state (idle/processing/stopped) |
+| `pipeline_complete` | Pipeline.run() | total_chunks, processed, failed, elapsed, throughput_mbps |
+| `pipeline_error` | Pipeline.run() | error |
+
+---
+
+## Thread Model (inside Celery worker)
+
+```
+Celery worker process
+  └─ run_job task thread
+       └─ Pipeline.run()
+            ├─ Producer thread     — enqueues chunks
+            ├─ Monitor thread      — emits progress every 500ms
+            ├─ Worker thread 0     — pulls from queue, processes
+            ├─ Worker thread 1     — pulls from queue, processes
+            ├─ Worker thread 2     — pulls from queue, processes
+            └─ Worker thread 3     — pulls from queue, processes
+```
+
+All threads share the same `event_callback` → `event_bridge` → `push_event()`, which creates a new Redis connection per call. Thread-safe via Redis atomic RPUSH.
+
+---
+
+## Infrastructure
+
+| Service | Port | Role |
+|---------|------|------|
+| nginx | 80 | Reverse proxy, static file serving |
+| fastapi | 8702 | GraphQL API (Strawberry) |
+| celery | — | Task worker (runs pipeline) |
+| redis | 6379 | Event bus + Celery broker |
+| grpc | 50051 | gRPC server (StreamChunkPipeline) |
+| envoy | 8090 | gRPC-Web ↔ native gRPC translation |
+| minio | 9000 | S3-compatible source media storage |
+| postgres | 5432 | Job/asset metadata |
--- a/docs/architecture/index.html
+++ b/docs/architecture/index.html
@@ -1,212 +0,0 @@
-<!doctype html>
-<html lang="en">
-    <head>
-        <meta charset="UTF-8" />
-        <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-        <title>MPR - Architecture</title>
-        <link rel="stylesheet" href="styles.css" />
-    </head>
-    <body>
-        <h1>MPR - Media Processor</h1>
-        <p>
-            Media transcoding platform with dual execution modes: local (Celery
-            + MinIO) and cloud (AWS Step Functions + Lambda + S3).
-        </p>
-
-        <nav>
-            <a href="#overview">System Overview</a>
-            <a href="#data-model">Data Model</a>
-            <a href="#job-flow">Job Flow</a>
-            <a href="#media-storage">Media Storage</a>
-        </nav>
-
-        <h2 id="overview">System Overview</h2>
-        <div class="diagram-container">
-            <div class="diagram">
-                <h3>Local Architecture (Development)</h3>
-                <object type="image/svg+xml" data="01a-local-architecture.svg">
-                    <img
-                        src="01a-local-architecture.svg"
-                        alt="Local Architecture"
-                    />
-                </object>
-                <a href="01a-local-architecture.svg" target="_blank"
-                    >Open full size</a
-                >
-            </div>
-            <div class="diagram">
-                <h3>AWS Architecture (Production)</h3>
-                <object type="image/svg+xml" data="01b-aws-architecture.svg">
-                    <img
-                        src="01b-aws-architecture.svg"
-                        alt="AWS Architecture"
-                    />
-                </object>
-                <a href="01b-aws-architecture.svg" target="_blank"
-                    >Open full size</a
-                >
-            </div>
-        </div>
-
-        <div class="legend">
-            <h3>Components</h3>
-            <ul>
-                <li>
-                    <span class="color-box" style="background: #e8f4f8"></span>
-                    Reverse Proxy (nginx)
-                </li>
-                <li>
-                    <span class="color-box" style="background: #f0f8e8"></span>
-                    Application Layer (Django Admin, GraphQL API, Timeline UI)
-                </li>
-                <li>
-                    <span class="color-box" style="background: #fff8e8"></span>
-                    Worker Layer (Celery local mode)
-                </li>
-                <li>
-                    <span class="color-box" style="background: #fde8d0"></span>
-                    AWS (Step Functions, Lambda - cloud mode)
-                </li>
-                <li>
-                    <span class="color-box" style="background: #f8e8f0"></span>
-                    Data Layer (PostgreSQL, Redis)
-                </li>
-                <li>
-                    <span class="color-box" style="background: #f0f0f0"></span>
-                    S3 Storage (MinIO local / AWS S3 cloud)
-                </li>
-            </ul>
-        </div>
-
-        <h2 id="data-model">Data Model</h2>
-        <div class="diagram-container">
-            <div class="diagram">
-                <h3>Entity Relationships</h3>
-                <object type="image/svg+xml" data="02-data-model.svg">
-                    <img src="02-data-model.svg" alt="Data Model" />
-                </object>
-                <a href="02-data-model.svg" target="_blank">Open full size</a>
-            </div>
-        </div>
-
-        <div class="legend">
-            <h3>Entities</h3>
-            <ul>
-                <li>
-                    <span class="color-box" style="background: #4a90d9"></span>
-                    MediaAsset - Video/audio files (S3 keys as paths)
-                </li>
-                <li>
-                    <span class="color-box" style="background: #50b050"></span>
-                    TranscodePreset - Encoding configurations
-                </li>
-                <li>
-                    <span class="color-box" style="background: #d9534f"></span>
-                    TranscodeJob - Processing queue (celery_task_id or
-                    execution_arn)
-                </li>
-            </ul>
-        </div>
-
-        <h2 id="job-flow">Job Flow</h2>
-        <div class="diagram-container">
-            <div class="diagram">
-                <h3>Job Lifecycle</h3>
-                <object type="image/svg+xml" data="03-job-flow.svg">
-                    <img src="03-job-flow.svg" alt="Job Flow" />
-                </object>
-                <a href="03-job-flow.svg" target="_blank">Open full size</a>
-            </div>
-        </div>
-
-        <div class="legend">
-            <h3>Job States</h3>
-            <ul>
-                <li>
-                    <span class="color-box" style="background: #ffc107"></span>
-                    PENDING - Waiting in queue
-                </li>
-                <li>
-                    <span class="color-box" style="background: #17a2b8"></span>
-                    PROCESSING - Worker executing
-                </li>
-                <li>
-                    <span class="color-box" style="background: #28a745"></span>
-                    COMPLETED - Success
-                </li>
-                <li>
-                    <span class="color-box" style="background: #dc3545"></span>
-                    FAILED - Error occurred
-                </li>
-                <li>
-                    <span class="color-box" style="background: #6c757d"></span>
-                    CANCELLED - User cancelled
-                </li>
-            </ul>
-            <h3>Execution Modes</h3>
-            <ul>
-                <li>
-                    <span class="color-box" style="background: #e8f4e8"></span>
-                    Local: Celery + MinIO (S3 API) + FFmpeg
-                </li>
-                <li>
-                    <span class="color-box" style="background: #fde8d0"></span>
-                    Lambda: Step Functions + Lambda + AWS S3
-                </li>
-            </ul>
-        </div>
-
-        <h2 id="media-storage">Media Storage</h2>
-        <div class="diagram-container">
-            <p>
-                MPR separates media into input and output paths for flexible
-                storage configuration.
-            </p>
-            <p>
-                <a href="04-media-storage.md" target="_blank"
-                    >View Media Storage Documentation →</a
-                >
-            </p>
-        </div>
-
-        <h2>API (GraphQL)</h2>
-        <pre><code># GraphiQL IDE
-http://mpr.local.ar/graphql
-
-# Queries
-query { assets(status: "ready") { id filename duration } }
-query { jobs(status: "processing") { id status progress } }
-query { presets { id name container videoCodec } }
-query { systemStatus { status version } }
-
-# Mutations
-mutation { scanMediaFolder { found registered skipped } }
-mutation { createJob(input: { sourceAssetId: "...", presetId: "..." }) { id status } }
-mutation { cancelJob(id: "...") { id status } }
-mutation { retryJob(id: "...") { id status } }
-mutation { updateAsset(id: "...", input: { comments: "..." }) { id comments } }
-mutation { deleteAsset(id: "...") { ok } }
-
-# Lambda callback (REST)
-POST /api/jobs/{id}/callback      - Lambda completion webhook</code></pre>
-
-        <h2>Access Points</h2>
-        <pre><code># Local development
-127.0.0.1 mpr.local.ar
-http://mpr.local.ar/admin         - Django Admin
-http://mpr.local.ar/graphql       - GraphiQL
-http://mpr.local.ar/              - Timeline UI
-http://localhost:9001              - MinIO Console
-
-# AWS deployment
-https://mpr.mcrn.ar/              - Production</code></pre>
-
-        <h2>Quick Reference</h2>
-        <pre><code># Render SVGs from DOT files
-for f in *.dot; do dot -Tsvg "$f" -o "${f%.dot}.svg"; done
-
-# Switch executor mode
-MPR_EXECUTOR=local    # Celery + MinIO
-MPR_EXECUTOR=lambda   # Step Functions + Lambda + S3</code></pre>
-    </body>
-</html>
--- a/docs/architecture/styles.css
+++ b/docs/architecture/styles.css
@@ -3,6 +3,8 @@
    --text-color: #e8e8e8;
    --accent-color: #4a90d9;
    --border-color: #333;
+    --sidebar-width: 220px;
+    --sidebar-bg: #151528;
 }

 * {
@@ -16,6 +18,59 @@ body {
    background-color: var(--bg-color);
    color: var(--text-color);
    line-height: 1.6;
+}
+
+/* Sidebar navigation */
+.sidebar {
+    position: fixed;
+    top: 0;
+    left: 0;
+    width: var(--sidebar-width);
+    height: 100vh;
+    background: var(--sidebar-bg);
+    border-right: 1px solid var(--border-color);
+    padding: 1.5rem 1rem;
+    overflow-y: auto;
+    z-index: 10;
+}
+
+.sidebar h2 {
+    font-size: 1.2rem;
+    color: var(--accent-color);
+    margin-bottom: 1.5rem;
+    padding-bottom: 0.5rem;
+    border-bottom: 1px solid var(--border-color);
+}
+
+.sidebar ul {
+    list-style: none;
+    display: flex;
+    flex-direction: column;
+    gap: 0.25rem;
+}
+
+.sidebar li {
+    display: block;
+}
+
+.sidebar a {
+    display: block;
+    padding: 0.4rem 0.6rem;
+    color: var(--text-color);
+    text-decoration: none;
+    font-size: 0.85rem;
+    border-radius: 4px;
+    transition: background 0.15s, color 0.15s;
+}
+
+.sidebar a:hover {
+    background: rgba(74, 144, 217, 0.15);
+    color: var(--accent-color);
+}
+
+/* Main content */
+.content {
+    margin-left: var(--sidebar-width);
    padding: 2rem;
 }

@@ -25,12 +80,13 @@ h1 {
    color: var(--accent-color);
 }

-h2 {
+.content > h2 {
    font-size: 1.5rem;
    margin: 2rem 0 1rem;
    color: var(--text-color);
    border-bottom: 1px solid var(--border-color);
    padding-bottom: 0.5rem;
+    scroll-margin-top: 1rem;
 }

 .diagram-container {
@@ -76,20 +132,6 @@ h2 {
    text-decoration: underline;
 }

-nav {
-    margin-bottom: 2rem;
-}
-
-nav a {
-    color: var(--accent-color);
-    text-decoration: none;
-    margin-right: 1.5rem;
-}
-
-nav a:hover {
-    text-decoration: underline;
-}
-
 .legend {
    margin-top: 2rem;
    padding: 1rem;
@@ -141,3 +183,27 @@ pre code {
    background: none;
    padding: 0;
 }
+
+/* Responsive: collapse sidebar on small screens */
+@media (max-width: 768px) {
+    .sidebar {
+        position: static;
+        width: 100%;
+        height: auto;
+        border-right: none;
+        border-bottom: 1px solid var(--border-color);
+    }
+
+    .sidebar ul {
+        flex-direction: row;
+        flex-wrap: wrap;
+    }
+
+    .content {
+        margin-left: 0;
+    }
+
+    .diagram {
+        min-width: 100%;
+    }
+}
--- a/docs/index.html
+++ b/docs/index.html
@@ -7,219 +7,241 @@
        <link rel="stylesheet" href="architecture/styles.css" />
    </head>
    <body>
-        <h1>MPR - Media Processor</h1>
-        <p>
-            Media transcoding platform with three execution modes: local (Celery
-            + MinIO), AWS (Step Functions + Lambda + S3), and GCP (Cloud Run
-            Jobs + GCS). Storage is S3-compatible across all environments.
-        </p>
-
-        <nav>
-            <a href="#overview">System Overview</a>
-            <a href="#data-model">Data Model</a>
-            <a href="#job-flow">Job Flow</a>
-            <a href="#media-storage">Media Storage</a>
+        <nav class="sidebar">
+            <h2>MPR</h2>
+            <ul>
+                <li><a href="#overview">System Overview</a></li>
+                <li><a href="#data-model">Data Model</a></li>
+                <li><a href="#job-flow">Job Flow</a></li>
+                <li><a href="#media-storage">Media Storage</a></li>
+                <li><a href="#chunker-pipeline">Chunker Pipeline</a></li>
+                <li><a href="#api">API (GraphQL)</a></li>
+                <li><a href="#access-points">Access Points</a></li>
+                <li><a href="#quick-reference">Quick Reference</a></li>
+            </ul>
        </nav>

-        <h2 id="overview">System Overview</h2>
-        <div class="diagram-container">
-            <div class="diagram">
-                <h3>Local Architecture (Development)</h3>
-                <object
-                    type="image/svg+xml"
-                    data="architecture/01a-local-architecture.svg"
-                >
-                    <img
-                        src="architecture/01a-local-architecture.svg"
-                        alt="Local Architecture"
-                    />
-                </object>
-                <a
-                    href="architecture/01a-local-architecture.svg"
-                    target="_blank"
-                    >Open full size</a
-                >
-            </div>
-            <div class="diagram">
-                <h3>AWS Architecture (Production)</h3>
-                <object
-                    type="image/svg+xml"
-                    data="architecture/01b-aws-architecture.svg"
-                >
-                    <img
-                        src="architecture/01b-aws-architecture.svg"
-                        alt="AWS Architecture"
-                    />
-                </object>
-                <a href="architecture/01b-aws-architecture.svg" target="_blank"
-                    >Open full size</a
-                >
-            </div>
-            <div class="diagram">
-                <h3>GCP Architecture (Production)</h3>
-                <object
-                    type="image/svg+xml"
-                    data="architecture/01c-gcp-architecture.svg"
-                >
-                    <img
-                        src="architecture/01c-gcp-architecture.svg"
-                        alt="GCP Architecture"
-                    />
-                </object>
-                <a href="architecture/01c-gcp-architecture.svg" target="_blank"
-                    >Open full size</a
-                >
-            </div>
-        </div>
-
-        <div class="legend">
-            <h3>Components</h3>
-            <ul>
-                <li>
-                    <span class="color-box" style="background: #e8f4f8"></span>
-                    Reverse Proxy (nginx)
-                </li>
-                <li>
-                    <span class="color-box" style="background: #f0f8e8"></span>
-                    Application Layer (Django Admin, GraphQL API, Timeline UI)
-                </li>
-                <li>
-                    <span class="color-box" style="background: #fff8e8"></span>
-                    Worker Layer (Celery local mode)
-                </li>
-                <li>
-                    <span class="color-box" style="background: #fde8d0"></span>
-                    AWS (Step Functions, Lambda)
-                </li>
-                <li>
-                    <span class="color-box" style="background: #e8f0fd"></span>
-                    GCP (Cloud Run Jobs + GCS)
-                </li>
-                <li>
-                    <span class="color-box" style="background: #f8e8f0"></span>
-                    Data Layer (PostgreSQL, Redis)
-                </li>
-                <li>
-                    <span class="color-box" style="background: #f0f0f0"></span>
-                    S3-compatible Storage (MinIO / AWS S3 / GCS)
-                </li>
-            </ul>
-        </div>
-
-        <h2 id="data-model">Data Model</h2>
-        <div class="diagram-container">
-            <div class="diagram">
-                <h3>Entity Relationships</h3>
-                <object
-                    type="image/svg+xml"
-                    data="architecture/02-data-model.svg"
-                >
-                    <img
-                        src="architecture/02-data-model.svg"
-                        alt="Data Model"
-                    />
-                </object>
-                <a href="architecture/02-data-model.svg" target="_blank"
-                    >Open full size</a
-                >
-            </div>
-        </div>
-
-        <div class="legend">
-            <h3>Entities</h3>
-            <ul>
-                <li>
-                    <span class="color-box" style="background: #4a90d9"></span>
-                    MediaAsset - Video/audio files with metadata
-                </li>
-                <li>
-                    <span class="color-box" style="background: #50b050"></span>
-                    TranscodePreset - Encoding configurations
-                </li>
-                <li>
-                    <span class="color-box" style="background: #d9534f"></span>
-                    TranscodeJob - Processing queue items
-                </li>
-            </ul>
-        </div>
-
-        <h2 id="job-flow">Job Flow</h2>
-        <div class="diagram-container">
-            <div class="diagram">
-                <h3>Job Lifecycle</h3>
-                <object
-                    type="image/svg+xml"
-                    data="architecture/03-job-flow.svg"
-                >
-                    <img src="architecture/03-job-flow.svg" alt="Job Flow" />
-                </object>
-                <a href="architecture/03-job-flow.svg" target="_blank"
-                    >Open full size</a
-                >
-            </div>
-        </div>
-
-        <div class="legend">
-            <h3>Job States</h3>
-            <ul>
-                <li>
-                    <span class="color-box" style="background: #ffc107"></span>
-                    PENDING - Waiting in queue
-                </li>
-                <li>
-                    <span class="color-box" style="background: #17a2b8"></span>
-                    PROCESSING - Worker executing
-                </li>
-                <li>
-                    <span class="color-box" style="background: #28a745"></span>
-                    COMPLETED - Success
-                </li>
-                <li>
-                    <span class="color-box" style="background: #dc3545"></span>
-                    FAILED - Error occurred
-                </li>
-                <li>
-                    <span class="color-box" style="background: #6c757d"></span>
-                    CANCELLED - User cancelled
-                </li>
-            </ul>
-        </div>
-
-        <h2 id="media-storage">Media Storage</h2>
-        <div class="diagram-container">
+        <main class="content">
+            <h1>MPR - Media Processor</h1>
            <p>
-                MPR separates media into <strong>input</strong> and
-                <strong>output</strong> paths, each independently configurable.
-                File paths are stored
-                <strong>relative to their respective root</strong> to ensure
-                portability between local development and cloud deployments (AWS
-                S3, etc.).
+                Media transcoding platform with three execution modes: local (Celery
+                + MinIO), AWS (Step Functions + Lambda + S3), and GCP (Cloud Run
+                Jobs + GCS). Storage is S3-compatible across all environments.
            </p>
-        </div>

-        <div class="legend">
-            <h3>Input / Output Separation</h3>
-            <ul>
-                <li>
-                    <span class="color-box" style="background: #4a90d9"></span>
-                    <code>MEDIA_IN</code> - Source media files to process
-                </li>
-                <li>
-                    <span class="color-box" style="background: #50b050"></span>
-                    <code>MEDIA_OUT</code> - Transcoded/trimmed output files
-                </li>
-            </ul>
-            <p><strong>Why Relative Paths?</strong></p>
-            <ul>
-                <li>Portability: Same database works locally and in cloud</li>
-                <li>Flexibility: Easy to switch between storage backends</li>
-                <li>Simplicity: No need to update paths when migrating</li>
-            </ul>
-        </div>
+            <h2 id="overview">System Overview</h2>
+            <div class="diagram-container">
+                <div class="diagram">
+                    <h3>Local Architecture (Development)</h3>
+                    <object
+                        type="image/svg+xml"
+                        data="architecture/01a-local-architecture.svg"
+                    >
+                        <img
+                            src="architecture/01a-local-architecture.svg"
+                            alt="Local Architecture"
+                        />
+                    </object>
+                    <a
+                        href="architecture/01a-local-architecture.svg"
+                        target="_blank"
+                        >Open full size</a
+                    >
+                </div>
+                <div class="diagram">
+                    <h3>AWS Architecture (Production)</h3>
+                    <object
+                        type="image/svg+xml"
+                        data="architecture/01b-aws-architecture.svg"
+                    >
+                        <img
+                            src="architecture/01b-aws-architecture.svg"
+                            alt="AWS Architecture"
+                        />
+                    </object>
+                    <a href="architecture/01b-aws-architecture.svg" target="_blank"
+                        >Open full size</a
+                    >
+                </div>
+                <div class="diagram">
+                    <h3>GCP Architecture (Production)</h3>
+                    <object
+                        type="image/svg+xml"
+                        data="architecture/01c-gcp-architecture.svg"
+                    >
+                        <img
+                            src="architecture/01c-gcp-architecture.svg"
+                            alt="GCP Architecture"
+                        />
+                    </object>
+                    <a href="architecture/01c-gcp-architecture.svg" target="_blank"
+                        >Open full size</a
+                    >
+                </div>
+            </div>

-        <div class="legend">
-            <h3>Local Development</h3>
-            <pre><code>MEDIA_IN=/app/media/in
+            <div class="legend">
+                <h3>Components</h3>
+                <ul>
+                    <li>
+                        <span class="color-box" style="background: #e8f4f8"></span>
+                        Reverse Proxy (nginx)
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #f0f8e8"></span>
+                        Application Layer (Django Admin, GraphQL API, Timeline UI)
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #fff8e8"></span>
+                        Worker Layer (Celery local mode)
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #fde8d0"></span>
+                        AWS (Step Functions, Lambda)
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #e8f0fd"></span>
+                        GCP (Cloud Run Jobs + GCS)
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #f8e8f0"></span>
+                        Data Layer (PostgreSQL, Redis)
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #f0f0f0"></span>
+                        S3-compatible Storage (MinIO / AWS S3 / GCS)
+                    </li>
+                </ul>
+            </div>
+
+            <h2 id="data-model">Data Model</h2>
+            <div class="diagram-container">
+                <div class="diagram">
+                    <h3>Entity Relationships</h3>
+                    <object
+                        type="image/svg+xml"
+                        data="architecture/02-data-model.svg"
+                    >
+                        <img
+                            src="architecture/02-data-model.svg"
+                            alt="Data Model"
+                        />
+                    </object>
+                    <a href="architecture/02-data-model.svg" target="_blank"
+                        >Open full size</a
+                    >
+                </div>
+            </div>
+
+            <div class="legend">
+                <h3>Entities</h3>
+                <ul>
+                    <li>
+                        <span class="color-box" style="background: #4a90d9"></span>
+                        MediaAsset - Video/audio files with metadata
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #50b050"></span>
+                        TranscodePreset - Encoding configurations
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #d9534f"></span>
+                        TranscodeJob - Processing queue items
+                    </li>
+                </ul>
+            </div>
+
+            <h2 id="job-flow">Job Flow</h2>
+            <div class="diagram-container">
+                <div class="diagram">
+                    <h3>Job Lifecycle</h3>
+                    <object
+                        type="image/svg+xml"
+                        data="architecture/03-job-flow.svg"
+                    >
+                        <img src="architecture/03-job-flow.svg" alt="Job Flow" />
+                    </object>
+                    <a href="architecture/03-job-flow.svg" target="_blank"
+                        >Open full size</a
+                    >
+                </div>
+            </div>
+
+            <div class="legend">
+                <h3>Job States</h3>
+                <ul>
+                    <li>
+                        <span class="color-box" style="background: #ffc107"></span>
+                        PENDING - Waiting in queue
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #17a2b8"></span>
+                        PROCESSING - Worker executing
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #28a745"></span>
+                        COMPLETED - Success
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #dc3545"></span>
+                        FAILED - Error occurred
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #6c757d"></span>
+                        CANCELLED - User cancelled
+                    </li>
+                </ul>
+                <h3>Execution Modes</h3>
+                <ul>
+                    <li>
+                        <span class="color-box" style="background: #e8f4e8"></span>
+                        Local: Celery + MinIO (S3 API) + FFmpeg
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #fde8d0"></span>
+                        Lambda: Step Functions + Lambda + AWS S3
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #e8f0fd"></span>
+                        GCP: Cloud Run Jobs + GCS (S3 compat)
+                    </li>
+                </ul>
+            </div>
+
+            <h2 id="media-storage">Media Storage</h2>
+            <div class="diagram-container">
+                <p>
+                    MPR separates media into <strong>input</strong> and
+                    <strong>output</strong> paths, each independently configurable.
+                    File paths are stored
+                    <strong>relative to their respective root</strong> to ensure
+                    portability between local development and cloud deployments.
+                </p>
+            </div>
+
+            <div class="legend">
+                <h3>Input / Output Separation</h3>
+                <ul>
+                    <li>
+                        <span class="color-box" style="background: #4a90d9"></span>
+                        <code>MEDIA_IN</code> - Source media files to process
+                    </li>
+                    <li>
+                        <span class="color-box" style="background: #50b050"></span>
+                        <code>MEDIA_OUT</code> - Transcoded/trimmed output files
+                    </li>
+                </ul>
+                <p><strong>Why Relative Paths?</strong></p>
+                <ul>
+                    <li>Portability: Same database works locally and in cloud</li>
+                    <li>Flexibility: Easy to switch between storage backends</li>
+                    <li>Simplicity: No need to update paths when migrating</li>
+                </ul>
+            </div>
+
+            <div class="legend">
+                <h3>Local Development</h3>
+                <pre><code>MEDIA_IN=/app/media/in
 MEDIA_OUT=/app/media/out

 /app/media/
@@ -228,52 +250,131 @@ MEDIA_OUT=/app/media/out
 │   └── subfolder/video3.mp4
 └── out/                   # Transcoded output
    └── video1_h264.mp4</code></pre>
-        </div>
+            </div>

-        <div class="legend">
-            <h3>AWS/Cloud Deployment</h3>
-            <pre><code>MEDIA_IN=s3://source-bucket/media/
+            <div class="legend">
+                <h3>AWS/Cloud Deployment</h3>
+                <pre><code>MEDIA_IN=s3://source-bucket/media/
 MEDIA_OUT=s3://output-bucket/transcoded/
 MEDIA_BASE_URL=https://source-bucket.s3.amazonaws.com/media/</code></pre>
-            <p>
-                Database paths remain unchanged (already relative). Just upload
-                files to S3 and update environment variables.
-            </p>
-        </div>
+                <p>
+                    Database paths remain unchanged (already relative). Just upload
+                    files to S3 and update environment variables.
+                </p>
+            </div>

-        <div class="legend">
-            <h3>API (GraphQL)</h3>
            <p>
-                All client interactions go through GraphQL at
-                <code>/graphql</code>.
+                <a href="architecture/04-media-storage.md" target="_blank"
+                    >Full Media Storage Documentation &rarr;</a
+                >
            </p>
-            <ul>
-                <li>
-                    <code>scanMediaFolder</code> - Scan S3 bucket for media
-                    files
-                </li>
-                <li><code>createJob</code> - Create transcode/trim job</li>
-                <li>
-                    <code>cancelJob / retryJob</code> - Job lifecycle management
-                </li>
-                <li>
-                    <code>updateAsset / deleteAsset</code> - Asset management
-                </li>
-            </ul>
-            <p><strong>Supported File Types:</strong></p>
-            <p>
-                Video: mp4, mkv, avi, mov, webm, flv, wmv, m4v<br />
-                Audio: mp3, wav, flac, aac, ogg, m4a
-            </p>
-        </div>

-        <h2>Access Points</h2>
-        <pre><code># Add to /etc/hosts
+            <h2 id="chunker-pipeline">Chunker Pipeline</h2>
+            <div class="diagram-container">
+                <p>
+                    The chunker pipeline splits media into time-based segments,
+                    streaming real-time events from worker threads through Redis
+                    and gRPC-Web to the browser UI. 7 hops from worker thread to pixel.
+                </p>
+            </div>
+
+            <div class="legend">
+                <h3>Event Path</h3>
+                <pre><code>Worker thread → Pipeline._emit() → event_bridge() → Redis RPUSH
+  → [50ms poll] gRPC server LRANGE → yield protobuf
+  → HTTP/2 frame → Envoy (grpc-web filter)
+  → HTTP/1.1 chunk → nginx (proxy_buffering off)
+  → fetch ReadableStream → protobuf-ts decode
+  → setEvents([...prev, evt]) → React re-render</code></pre>
+            </div>
+
+            <div class="legend">
+                <h3>Thread Model (inside Celery worker)</h3>
+                <pre><code>Celery worker process
+  └─ run_job task thread
+       └─ Pipeline.run()
+            ├─ Producer thread     — enqueues chunks
+            ├─ Monitor thread      — emits progress every 500ms
+            ├─ Worker thread 0     — pulls from queue, processes
+            ├─ Worker thread 1     — pulls from queue, processes
+            ├─ Worker thread 2     — pulls from queue, processes
+            └─ Worker thread 3     — pulls from queue, processes</code></pre>
+            </div>
+
+            <div class="legend">
+                <h3>Infrastructure</h3>
+                <ul>
+                    <li><code>nginx :80</code> - Reverse proxy, static file serving</li>
+                    <li><code>fastapi :8702</code> - GraphQL API (Strawberry)</li>
+                    <li><code>celery</code> - Task worker (runs pipeline)</li>
+                    <li><code>redis :6379</code> - Event bus + Celery broker</li>
+                    <li><code>grpc :50051</code> - gRPC server (StreamChunkPipeline)</li>
+                    <li><code>envoy :8090</code> - gRPC-Web &harr; native gRPC translation</li>
+                    <li><code>minio :9000</code> - S3-compatible source media storage</li>
+                    <li><code>postgres :5432</code> - Job/asset metadata</li>
+                </ul>
+            </div>
+
+            <p>
+                <a href="architecture/05-chunker-pipeline.md" target="_blank"
+                    >Full Chunker Pipeline Documentation &rarr;</a
+                >
+            </p>
+
+            <h2 id="api">API (GraphQL)</h2>
+            <div class="legend">
+                <p>
+                    All client interactions go through GraphQL at
+                    <code>/graphql</code>.
+                </p>
+                <pre><code># GraphiQL IDE
+http://mpr.local.ar/graphql
+
+# Queries
+query { assets(status: "ready") { id filename duration } }
+query { jobs(status: "processing") { id status progress } }
+query { presets { id name container videoCodec } }
+query { systemStatus { status version } }
+
+# Mutations
+mutation { scanMediaFolder { found registered skipped } }
+mutation { createJob(input: { sourceAssetId: "...", presetId: "..." }) { id status } }
+mutation { cancelJob(id: "...") { id status } }
+mutation { retryJob(id: "...") { id status } }
+mutation { updateAsset(id: "...", input: { comments: "..." }) { id comments } }
+mutation { deleteAsset(id: "...") { ok } }
+
+# Lambda callback (REST)
+POST /api/jobs/{id}/callback      - Lambda completion webhook</code></pre>
+                <p><strong>Supported File Types:</strong></p>
+                <p>
+                    Video: mp4, mkv, avi, mov, webm, flv, wmv, m4v<br />
+                    Audio: mp3, wav, flac, aac, ogg, m4a
+                </p>
+            </div>
+
+            <h2 id="access-points">Access Points</h2>
+            <pre><code># Add to /etc/hosts
 127.0.0.1 mpr.local.ar

 # URLs
-http://mpr.local.ar/admin    - Django Admin
-http://mpr.local.ar/graphql  - GraphiQL IDE
-http://mpr.local.ar/         - Timeline UI</code></pre>
+http://mpr.local.ar/admin         - Django Admin
+http://mpr.local.ar/graphql       - GraphiQL IDE
+http://mpr.local.ar/              - Timeline UI
+http://mpr.local.ar/chunker/      - Chunker UI
+http://localhost:9001              - MinIO Console
+
+# AWS deployment
+https://mpr.mcrn.ar/              - Production</code></pre>
+
+            <h2 id="quick-reference">Quick Reference</h2>
+            <pre><code># Render SVGs from DOT files
+for f in docs/architecture/*.dot; do dot -Tsvg "$f" -o "${f%.dot}.svg"; done
+
+# Switch executor mode
+MPR_EXECUTOR=local    # Celery + MinIO
+MPR_EXECUTOR=lambda   # Step Functions + Lambda + S3
+MPR_EXECUTOR=gcp      # Cloud Run Jobs + GCS</code></pre>
+        </main>
    </body>
 </html>