Files

buenosairesam 3eeedebb15 major refactor

2026-03-13 01:07:02 -03:00

5.0 KiB

Raw Blame History

Media Storage Architecture

Overview

MPR uses S3-compatible storage everywhere. Locally via MinIO, in production via AWS S3. The same boto3 code and S3 keys work in both environments - the only difference is the S3_ENDPOINT_URL env var.

Storage Strategy

S3 Buckets

Bucket	Env Var	Purpose
`mpr-media-in`	`S3_BUCKET_IN`	Source media files
`mpr-media-out`	`S3_BUCKET_OUT`	Transcoded/trimmed output

S3 Keys as File Paths

Database: Stores S3 object keys (e.g., video1.mp4, subfolder/video3.mp4)
Local dev: MinIO serves these via S3 API on port 9000
AWS: Real S3, same keys, different endpoint

Why S3 Everywhere?

Identical code paths - no branching between local and cloud
Seamless executor switching - Celery and Lambda both use boto3
Cloud-native - ready for production without refactoring

Local Development (MinIO)

Configuration

S3_ENDPOINT_URL=http://minio:9000
S3_BUCKET_IN=mpr-media-in
S3_BUCKET_OUT=mpr-media-out
AWS_ACCESS_KEY_ID=minioadmin
AWS_SECRET_ACCESS_KEY=minioadmin

How It Works

MinIO runs as a Docker container (port 9000 API, port 9001 console)
minio-init container creates buckets and sets public read access on startup
Nginx proxies /media/in/ and /media/out/ to MinIO buckets
Upload files via MinIO Console (http://localhost:9001) or mc CLI

Upload Files to MinIO

# Using mc CLI
mc alias set local http://localhost:9000 minioadmin minioadmin
mc cp video.mp4 local/mpr-media-in/

# Using aws CLI with endpoint override
aws --endpoint-url http://localhost:9000 s3 cp video.mp4 s3://mpr-media-in/

AWS Production (S3)

Configuration

# No S3_ENDPOINT_URL = uses real AWS S3
S3_BUCKET_IN=mpr-media-in
S3_BUCKET_OUT=mpr-media-out
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=<real-key>
AWS_SECRET_ACCESS_KEY=<real-secret>

Upload Files to S3

aws s3 cp video.mp4 s3://mpr-media-in/
aws s3 sync /local/media/ s3://mpr-media-in/

GCP Production (GCS via S3 compatibility)

GCS exposes an S3-compatible API. The same core/storage/s3.py boto3 code works with no changes — only the endpoint and credentials differ.

GCS HMAC Keys

Generate under Cloud Storage → Settings → Interoperability in the GCP console. These act as AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY.

Configuration

S3_ENDPOINT_URL=https://storage.googleapis.com
S3_BUCKET_IN=mpr-media-in
S3_BUCKET_OUT=mpr-media-out
AWS_ACCESS_KEY_ID=<GCS HMAC access key>
AWS_SECRET_ACCESS_KEY=<GCS HMAC secret>

# Executor
MPR_EXECUTOR=gcp
GCP_PROJECT_ID=my-project
GCP_REGION=us-central1
CLOUD_RUN_JOB=mpr-transcode
CALLBACK_URL=https://mpr.mcrn.ar/api
CALLBACK_API_KEY=<secret>

Upload Files to GCS

gcloud storage cp video.mp4 gs://mpr-media-in/

# Or with the aws CLI via compat endpoint
aws --endpoint-url https://storage.googleapis.com s3 cp video.mp4 s3://mpr-media-in/

Cloud Run Job Handler

core/task/gcp_handler.py is the Cloud Run Job entrypoint. It reads the job payload from MPR_JOB_PAYLOAD (injected by GCPExecutor), uses core/storage for all GCS access (S3 compat), and POSTs the completion callback to the API.

Set the Cloud Run Job command to: python -m core.task.gcp_handler

Storage Module

core/storage/ package provides all S3 operations:

from core.storage import (
    get_s3_client,     # boto3 client (MinIO or AWS)
    list_objects,      # List bucket contents, filter by extension
    download_file,     # Download S3 object to local path
    download_to_temp,  # Download to temp file (caller cleans up)
    upload_file,       # Upload local file to S3
    get_presigned_url, # Generate presigned URL
    BUCKET_IN,         # Input bucket name
    BUCKET_OUT,        # Output bucket name
)

API Endpoints

Scan Media (REST)

POST /api/assets/scan

Lists objects in S3_BUCKET_IN, registers new media files.

Scan Media (GraphQL)

mutation { scanMediaFolder { found registered skipped files } }

Job Flow with S3

Local Mode (Celery)

Celery task receives source_key and output_key
Downloads source from S3_BUCKET_IN to temp file
Runs FFmpeg locally
Uploads result to S3_BUCKET_OUT
Cleans up temp files

Lambda Mode (AWS)

Step Functions invokes Lambda with S3 keys
Lambda downloads source from S3_BUCKET_IN to /tmp
Runs FFmpeg in container
Uploads result to S3_BUCKET_OUT
Calls back to API with result

Cloud Run Job Mode (GCP)

GCPExecutor triggers Cloud Run Job with payload in MPR_JOB_PAYLOAD
core/task/gcp_handler.py downloads source from S3_BUCKET_IN (GCS S3 compat)
Runs FFmpeg in container
Uploads result to S3_BUCKET_OUT (GCS S3 compat)
Calls back to API with result

All three paths use the same S3-compatible bucket names and key structure.

Supported File Types

Video: .mp4, .mkv, .avi, .mov, .webm, .flv, .wmv, .m4v Audio: .mp3, .wav, .flac, .aac, .ogg, .m4a

5.0 KiB Raw Blame History

Media Storage Architecture

Overview

Storage Strategy

S3 Buckets

S3 Keys as File Paths

Why S3 Everywhere?

Local Development (MinIO)

Configuration

How It Works

Upload Files to MinIO

AWS Production (S3)

Configuration

Upload Files to S3

GCP Production (GCS via S3 compatibility)

GCS HMAC Keys

Configuration

Upload Files to GCS

Cloud Run Job Handler

Storage Module

API Endpoints

Scan Media (REST)

Scan Media (GraphQL)

Job Flow with S3

Local Mode (Celery)

Lambda Mode (AWS)

Cloud Run Job Mode (GCP)

Supported File Types

5.0 KiB

Raw Blame History