chunker ui redo

This commit is contained in:
2026-03-15 16:03:53 -03:00
parent d5a3372d6b
commit b40bd68411
62 changed files with 5460 additions and 1493 deletions

View File

@@ -7,219 +7,241 @@
<link rel="stylesheet" href="architecture/styles.css" />
</head>
<body>
<h1>MPR - Media Processor</h1>
<p>
Media transcoding platform with three execution modes: local (Celery
+ MinIO), AWS (Step Functions + Lambda + S3), and GCP (Cloud Run
Jobs + GCS). Storage is S3-compatible across all environments.
</p>
<nav>
<a href="#overview">System Overview</a>
<a href="#data-model">Data Model</a>
<a href="#job-flow">Job Flow</a>
<a href="#media-storage">Media Storage</a>
<nav class="sidebar">
<h2>MPR</h2>
<ul>
<li><a href="#overview">System Overview</a></li>
<li><a href="#data-model">Data Model</a></li>
<li><a href="#job-flow">Job Flow</a></li>
<li><a href="#media-storage">Media Storage</a></li>
<li><a href="#chunker-pipeline">Chunker Pipeline</a></li>
<li><a href="#api">API (GraphQL)</a></li>
<li><a href="#access-points">Access Points</a></li>
<li><a href="#quick-reference">Quick Reference</a></li>
</ul>
</nav>
<h2 id="overview">System Overview</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Local Architecture (Development)</h3>
<object
type="image/svg+xml"
data="architecture/01a-local-architecture.svg"
>
<img
src="architecture/01a-local-architecture.svg"
alt="Local Architecture"
/>
</object>
<a
href="architecture/01a-local-architecture.svg"
target="_blank"
>Open full size</a
>
</div>
<div class="diagram">
<h3>AWS Architecture (Production)</h3>
<object
type="image/svg+xml"
data="architecture/01b-aws-architecture.svg"
>
<img
src="architecture/01b-aws-architecture.svg"
alt="AWS Architecture"
/>
</object>
<a href="architecture/01b-aws-architecture.svg" target="_blank"
>Open full size</a
>
</div>
<div class="diagram">
<h3>GCP Architecture (Production)</h3>
<object
type="image/svg+xml"
data="architecture/01c-gcp-architecture.svg"
>
<img
src="architecture/01c-gcp-architecture.svg"
alt="GCP Architecture"
/>
</object>
<a href="architecture/01c-gcp-architecture.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Components</h3>
<ul>
<li>
<span class="color-box" style="background: #e8f4f8"></span>
Reverse Proxy (nginx)
</li>
<li>
<span class="color-box" style="background: #f0f8e8"></span>
Application Layer (Django Admin, GraphQL API, Timeline UI)
</li>
<li>
<span class="color-box" style="background: #fff8e8"></span>
Worker Layer (Celery local mode)
</li>
<li>
<span class="color-box" style="background: #fde8d0"></span>
AWS (Step Functions, Lambda)
</li>
<li>
<span class="color-box" style="background: #e8f0fd"></span>
GCP (Cloud Run Jobs + GCS)
</li>
<li>
<span class="color-box" style="background: #f8e8f0"></span>
Data Layer (PostgreSQL, Redis)
</li>
<li>
<span class="color-box" style="background: #f0f0f0"></span>
S3-compatible Storage (MinIO / AWS S3 / GCS)
</li>
</ul>
</div>
<h2 id="data-model">Data Model</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Entity Relationships</h3>
<object
type="image/svg+xml"
data="architecture/02-data-model.svg"
>
<img
src="architecture/02-data-model.svg"
alt="Data Model"
/>
</object>
<a href="architecture/02-data-model.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Entities</h3>
<ul>
<li>
<span class="color-box" style="background: #4a90d9"></span>
MediaAsset - Video/audio files with metadata
</li>
<li>
<span class="color-box" style="background: #50b050"></span>
TranscodePreset - Encoding configurations
</li>
<li>
<span class="color-box" style="background: #d9534f"></span>
TranscodeJob - Processing queue items
</li>
</ul>
</div>
<h2 id="job-flow">Job Flow</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Job Lifecycle</h3>
<object
type="image/svg+xml"
data="architecture/03-job-flow.svg"
>
<img src="architecture/03-job-flow.svg" alt="Job Flow" />
</object>
<a href="architecture/03-job-flow.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Job States</h3>
<ul>
<li>
<span class="color-box" style="background: #ffc107"></span>
PENDING - Waiting in queue
</li>
<li>
<span class="color-box" style="background: #17a2b8"></span>
PROCESSING - Worker executing
</li>
<li>
<span class="color-box" style="background: #28a745"></span>
COMPLETED - Success
</li>
<li>
<span class="color-box" style="background: #dc3545"></span>
FAILED - Error occurred
</li>
<li>
<span class="color-box" style="background: #6c757d"></span>
CANCELLED - User cancelled
</li>
</ul>
</div>
<h2 id="media-storage">Media Storage</h2>
<div class="diagram-container">
<main class="content">
<h1>MPR - Media Processor</h1>
<p>
MPR separates media into <strong>input</strong> and
<strong>output</strong> paths, each independently configurable.
File paths are stored
<strong>relative to their respective root</strong> to ensure
portability between local development and cloud deployments (AWS
S3, etc.).
Media transcoding platform with three execution modes: local (Celery
+ MinIO), AWS (Step Functions + Lambda + S3), and GCP (Cloud Run
Jobs + GCS). Storage is S3-compatible across all environments.
</p>
</div>
<div class="legend">
<h3>Input / Output Separation</h3>
<ul>
<li>
<span class="color-box" style="background: #4a90d9"></span>
<code>MEDIA_IN</code> - Source media files to process
</li>
<li>
<span class="color-box" style="background: #50b050"></span>
<code>MEDIA_OUT</code> - Transcoded/trimmed output files
</li>
</ul>
<p><strong>Why Relative Paths?</strong></p>
<ul>
<li>Portability: Same database works locally and in cloud</li>
<li>Flexibility: Easy to switch between storage backends</li>
<li>Simplicity: No need to update paths when migrating</li>
</ul>
</div>
<h2 id="overview">System Overview</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Local Architecture (Development)</h3>
<object
type="image/svg+xml"
data="architecture/01a-local-architecture.svg"
>
<img
src="architecture/01a-local-architecture.svg"
alt="Local Architecture"
/>
</object>
<a
href="architecture/01a-local-architecture.svg"
target="_blank"
>Open full size</a
>
</div>
<div class="diagram">
<h3>AWS Architecture (Production)</h3>
<object
type="image/svg+xml"
data="architecture/01b-aws-architecture.svg"
>
<img
src="architecture/01b-aws-architecture.svg"
alt="AWS Architecture"
/>
</object>
<a href="architecture/01b-aws-architecture.svg" target="_blank"
>Open full size</a
>
</div>
<div class="diagram">
<h3>GCP Architecture (Production)</h3>
<object
type="image/svg+xml"
data="architecture/01c-gcp-architecture.svg"
>
<img
src="architecture/01c-gcp-architecture.svg"
alt="GCP Architecture"
/>
</object>
<a href="architecture/01c-gcp-architecture.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Local Development</h3>
<pre><code>MEDIA_IN=/app/media/in
<div class="legend">
<h3>Components</h3>
<ul>
<li>
<span class="color-box" style="background: #e8f4f8"></span>
Reverse Proxy (nginx)
</li>
<li>
<span class="color-box" style="background: #f0f8e8"></span>
Application Layer (Django Admin, GraphQL API, Timeline UI)
</li>
<li>
<span class="color-box" style="background: #fff8e8"></span>
Worker Layer (Celery local mode)
</li>
<li>
<span class="color-box" style="background: #fde8d0"></span>
AWS (Step Functions, Lambda)
</li>
<li>
<span class="color-box" style="background: #e8f0fd"></span>
GCP (Cloud Run Jobs + GCS)
</li>
<li>
<span class="color-box" style="background: #f8e8f0"></span>
Data Layer (PostgreSQL, Redis)
</li>
<li>
<span class="color-box" style="background: #f0f0f0"></span>
S3-compatible Storage (MinIO / AWS S3 / GCS)
</li>
</ul>
</div>
<h2 id="data-model">Data Model</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Entity Relationships</h3>
<object
type="image/svg+xml"
data="architecture/02-data-model.svg"
>
<img
src="architecture/02-data-model.svg"
alt="Data Model"
/>
</object>
<a href="architecture/02-data-model.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Entities</h3>
<ul>
<li>
<span class="color-box" style="background: #4a90d9"></span>
MediaAsset - Video/audio files with metadata
</li>
<li>
<span class="color-box" style="background: #50b050"></span>
TranscodePreset - Encoding configurations
</li>
<li>
<span class="color-box" style="background: #d9534f"></span>
TranscodeJob - Processing queue items
</li>
</ul>
</div>
<h2 id="job-flow">Job Flow</h2>
<div class="diagram-container">
<div class="diagram">
<h3>Job Lifecycle</h3>
<object
type="image/svg+xml"
data="architecture/03-job-flow.svg"
>
<img src="architecture/03-job-flow.svg" alt="Job Flow" />
</object>
<a href="architecture/03-job-flow.svg" target="_blank"
>Open full size</a
>
</div>
</div>
<div class="legend">
<h3>Job States</h3>
<ul>
<li>
<span class="color-box" style="background: #ffc107"></span>
PENDING - Waiting in queue
</li>
<li>
<span class="color-box" style="background: #17a2b8"></span>
PROCESSING - Worker executing
</li>
<li>
<span class="color-box" style="background: #28a745"></span>
COMPLETED - Success
</li>
<li>
<span class="color-box" style="background: #dc3545"></span>
FAILED - Error occurred
</li>
<li>
<span class="color-box" style="background: #6c757d"></span>
CANCELLED - User cancelled
</li>
</ul>
<h3>Execution Modes</h3>
<ul>
<li>
<span class="color-box" style="background: #e8f4e8"></span>
Local: Celery + MinIO (S3 API) + FFmpeg
</li>
<li>
<span class="color-box" style="background: #fde8d0"></span>
Lambda: Step Functions + Lambda + AWS S3
</li>
<li>
<span class="color-box" style="background: #e8f0fd"></span>
GCP: Cloud Run Jobs + GCS (S3 compat)
</li>
</ul>
</div>
<h2 id="media-storage">Media Storage</h2>
<div class="diagram-container">
<p>
MPR separates media into <strong>input</strong> and
<strong>output</strong> paths, each independently configurable.
File paths are stored
<strong>relative to their respective root</strong> to ensure
portability between local development and cloud deployments.
</p>
</div>
<div class="legend">
<h3>Input / Output Separation</h3>
<ul>
<li>
<span class="color-box" style="background: #4a90d9"></span>
<code>MEDIA_IN</code> - Source media files to process
</li>
<li>
<span class="color-box" style="background: #50b050"></span>
<code>MEDIA_OUT</code> - Transcoded/trimmed output files
</li>
</ul>
<p><strong>Why Relative Paths?</strong></p>
<ul>
<li>Portability: Same database works locally and in cloud</li>
<li>Flexibility: Easy to switch between storage backends</li>
<li>Simplicity: No need to update paths when migrating</li>
</ul>
</div>
<div class="legend">
<h3>Local Development</h3>
<pre><code>MEDIA_IN=/app/media/in
MEDIA_OUT=/app/media/out
/app/media/
@@ -228,52 +250,131 @@ MEDIA_OUT=/app/media/out
│ └── subfolder/video3.mp4
└── out/ # Transcoded output
└── video1_h264.mp4</code></pre>
</div>
</div>
<div class="legend">
<h3>AWS/Cloud Deployment</h3>
<pre><code>MEDIA_IN=s3://source-bucket/media/
<div class="legend">
<h3>AWS/Cloud Deployment</h3>
<pre><code>MEDIA_IN=s3://source-bucket/media/
MEDIA_OUT=s3://output-bucket/transcoded/
MEDIA_BASE_URL=https://source-bucket.s3.amazonaws.com/media/</code></pre>
<p>
Database paths remain unchanged (already relative). Just upload
files to S3 and update environment variables.
</p>
</div>
<p>
Database paths remain unchanged (already relative). Just upload
files to S3 and update environment variables.
</p>
</div>
<div class="legend">
<h3>API (GraphQL)</h3>
<p>
All client interactions go through GraphQL at
<code>/graphql</code>.
<a href="architecture/04-media-storage.md" target="_blank"
>Full Media Storage Documentation &rarr;</a
>
</p>
<ul>
<li>
<code>scanMediaFolder</code> - Scan S3 bucket for media
files
</li>
<li><code>createJob</code> - Create transcode/trim job</li>
<li>
<code>cancelJob / retryJob</code> - Job lifecycle management
</li>
<li>
<code>updateAsset / deleteAsset</code> - Asset management
</li>
</ul>
<p><strong>Supported File Types:</strong></p>
<p>
Video: mp4, mkv, avi, mov, webm, flv, wmv, m4v<br />
Audio: mp3, wav, flac, aac, ogg, m4a
</p>
</div>
<h2>Access Points</h2>
<pre><code># Add to /etc/hosts
<h2 id="chunker-pipeline">Chunker Pipeline</h2>
<div class="diagram-container">
<p>
The chunker pipeline splits media into time-based segments,
streaming real-time events from worker threads through Redis
and gRPC-Web to the browser UI. 7 hops from worker thread to pixel.
</p>
</div>
<div class="legend">
<h3>Event Path</h3>
<pre><code>Worker thread → Pipeline._emit() → event_bridge() → Redis RPUSH
→ [50ms poll] gRPC server LRANGE → yield protobuf
→ HTTP/2 frame → Envoy (grpc-web filter)
→ HTTP/1.1 chunk → nginx (proxy_buffering off)
→ fetch ReadableStream → protobuf-ts decode
→ setEvents([...prev, evt]) → React re-render</code></pre>
</div>
<div class="legend">
<h3>Thread Model (inside Celery worker)</h3>
<pre><code>Celery worker process
└─ run_job task thread
└─ Pipeline.run()
├─ Producer thread — enqueues chunks
├─ Monitor thread — emits progress every 500ms
├─ Worker thread 0 — pulls from queue, processes
├─ Worker thread 1 — pulls from queue, processes
├─ Worker thread 2 — pulls from queue, processes
└─ Worker thread 3 — pulls from queue, processes</code></pre>
</div>
<div class="legend">
<h3>Infrastructure</h3>
<ul>
<li><code>nginx :80</code> - Reverse proxy, static file serving</li>
<li><code>fastapi :8702</code> - GraphQL API (Strawberry)</li>
<li><code>celery</code> - Task worker (runs pipeline)</li>
<li><code>redis :6379</code> - Event bus + Celery broker</li>
<li><code>grpc :50051</code> - gRPC server (StreamChunkPipeline)</li>
<li><code>envoy :8090</code> - gRPC-Web &harr; native gRPC translation</li>
<li><code>minio :9000</code> - S3-compatible source media storage</li>
<li><code>postgres :5432</code> - Job/asset metadata</li>
</ul>
</div>
<p>
<a href="architecture/05-chunker-pipeline.md" target="_blank"
>Full Chunker Pipeline Documentation &rarr;</a
>
</p>
<h2 id="api">API (GraphQL)</h2>
<div class="legend">
<p>
All client interactions go through GraphQL at
<code>/graphql</code>.
</p>
<pre><code># GraphiQL IDE
http://mpr.local.ar/graphql
# Queries
query { assets(status: "ready") { id filename duration } }
query { jobs(status: "processing") { id status progress } }
query { presets { id name container videoCodec } }
query { systemStatus { status version } }
# Mutations
mutation { scanMediaFolder { found registered skipped } }
mutation { createJob(input: { sourceAssetId: "...", presetId: "..." }) { id status } }
mutation { cancelJob(id: "...") { id status } }
mutation { retryJob(id: "...") { id status } }
mutation { updateAsset(id: "...", input: { comments: "..." }) { id comments } }
mutation { deleteAsset(id: "...") { ok } }
# Lambda callback (REST)
POST /api/jobs/{id}/callback - Lambda completion webhook</code></pre>
<p><strong>Supported File Types:</strong></p>
<p>
Video: mp4, mkv, avi, mov, webm, flv, wmv, m4v<br />
Audio: mp3, wav, flac, aac, ogg, m4a
</p>
</div>
<h2 id="access-points">Access Points</h2>
<pre><code># Add to /etc/hosts
127.0.0.1 mpr.local.ar
# URLs
http://mpr.local.ar/admin - Django Admin
http://mpr.local.ar/graphql - GraphiQL IDE
http://mpr.local.ar/ - Timeline UI</code></pre>
http://mpr.local.ar/admin - Django Admin
http://mpr.local.ar/graphql - GraphiQL IDE
http://mpr.local.ar/ - Timeline UI
http://mpr.local.ar/chunker/ - Chunker UI
http://localhost:9001 - MinIO Console
# AWS deployment
https://mpr.mcrn.ar/ - Production</code></pre>
<h2 id="quick-reference">Quick Reference</h2>
<pre><code># Render SVGs from DOT files
for f in docs/architecture/*.dot; do dot -Tsvg "$f" -o "${f%.dot}.svg"; done
# Switch executor mode
MPR_EXECUTOR=local # Celery + MinIO
MPR_EXECUTOR=lambda # Step Functions + Lambda + S3
MPR_EXECUTOR=gcp # Cloud Run Jobs + GCS</code></pre>
</main>
</body>
</html>