5.0 KiB
Media Storage Architecture
Overview
MPR uses S3-compatible storage everywhere. Locally via MinIO, in production via AWS S3. The same boto3 code and S3 keys work in both environments - the only difference is the S3_ENDPOINT_URL env var.
Storage Strategy
S3 Buckets
| Bucket | Env Var | Purpose |
|---|---|---|
mpr-media-in |
S3_BUCKET_IN |
Source media files |
mpr-media-out |
S3_BUCKET_OUT |
Transcoded/trimmed output |
S3 Keys as File Paths
- Database: Stores S3 object keys (e.g.,
video1.mp4,subfolder/video3.mp4) - Local dev: MinIO serves these via S3 API on port 9000
- AWS: Real S3, same keys, different endpoint
Why S3 Everywhere?
- Identical code paths - no branching between local and cloud
- Seamless executor switching - Celery and Lambda both use boto3
- Cloud-native - ready for production without refactoring
Local Development (MinIO)
Configuration
S3_ENDPOINT_URL=http://minio:9000
S3_BUCKET_IN=mpr-media-in
S3_BUCKET_OUT=mpr-media-out
AWS_ACCESS_KEY_ID=minioadmin
AWS_SECRET_ACCESS_KEY=minioadmin
How It Works
- MinIO runs as a Docker container (port 9000 API, port 9001 console)
minio-initcontainer creates buckets and sets public read access on startup- Nginx proxies
/media/in/and/media/out/to MinIO buckets - Upload files via MinIO Console (http://localhost:9001) or
mcCLI
Upload Files to MinIO
# Using mc CLI
mc alias set local http://localhost:9000 minioadmin minioadmin
mc cp video.mp4 local/mpr-media-in/
# Using aws CLI with endpoint override
aws --endpoint-url http://localhost:9000 s3 cp video.mp4 s3://mpr-media-in/
AWS Production (S3)
Configuration
# No S3_ENDPOINT_URL = uses real AWS S3
S3_BUCKET_IN=mpr-media-in
S3_BUCKET_OUT=mpr-media-out
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=<real-key>
AWS_SECRET_ACCESS_KEY=<real-secret>
Upload Files to S3
aws s3 cp video.mp4 s3://mpr-media-in/
aws s3 sync /local/media/ s3://mpr-media-in/
GCP Production (GCS via S3 compatibility)
GCS exposes an S3-compatible API. The same core/storage/s3.py boto3 code works
with no changes — only the endpoint and credentials differ.
GCS HMAC Keys
Generate under Cloud Storage → Settings → Interoperability in the GCP console.
These act as AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY.
Configuration
S3_ENDPOINT_URL=https://storage.googleapis.com
S3_BUCKET_IN=mpr-media-in
S3_BUCKET_OUT=mpr-media-out
AWS_ACCESS_KEY_ID=<GCS HMAC access key>
AWS_SECRET_ACCESS_KEY=<GCS HMAC secret>
# Executor
MPR_EXECUTOR=gcp
GCP_PROJECT_ID=my-project
GCP_REGION=us-central1
CLOUD_RUN_JOB=mpr-transcode
CALLBACK_URL=https://mpr.mcrn.ar/api
CALLBACK_API_KEY=<secret>
Upload Files to GCS
gcloud storage cp video.mp4 gs://mpr-media-in/
# Or with the aws CLI via compat endpoint
aws --endpoint-url https://storage.googleapis.com s3 cp video.mp4 s3://mpr-media-in/
Cloud Run Job Handler
core/task/gcp_handler.py is the Cloud Run Job entrypoint. It reads the job payload
from MPR_JOB_PAYLOAD (injected by GCPExecutor), uses core/storage for all
GCS access (S3 compat), and POSTs the completion callback to the API.
Set the Cloud Run Job command to: python -m core.task.gcp_handler
Storage Module
core/storage/ package provides all S3 operations:
from core.storage import (
get_s3_client, # boto3 client (MinIO or AWS)
list_objects, # List bucket contents, filter by extension
download_file, # Download S3 object to local path
download_to_temp, # Download to temp file (caller cleans up)
upload_file, # Upload local file to S3
get_presigned_url, # Generate presigned URL
BUCKET_IN, # Input bucket name
BUCKET_OUT, # Output bucket name
)
API Endpoints
Scan Media (REST)
POST /api/assets/scan
Lists objects in S3_BUCKET_IN, registers new media files.
Scan Media (GraphQL)
mutation { scanMediaFolder { found registered skipped files } }
Job Flow with S3
Local Mode (Celery)
- Celery task receives
source_keyandoutput_key - Downloads source from
S3_BUCKET_INto temp file - Runs FFmpeg locally
- Uploads result to
S3_BUCKET_OUT - Cleans up temp files
Lambda Mode (AWS)
- Step Functions invokes Lambda with S3 keys
- Lambda downloads source from
S3_BUCKET_INto/tmp - Runs FFmpeg in container
- Uploads result to
S3_BUCKET_OUT - Calls back to API with result
Cloud Run Job Mode (GCP)
GCPExecutortriggers Cloud Run Job with payload inMPR_JOB_PAYLOADcore/task/gcp_handler.pydownloads source fromS3_BUCKET_IN(GCS S3 compat)- Runs FFmpeg in container
- Uploads result to
S3_BUCKET_OUT(GCS S3 compat) - Calls back to API with result
All three paths use the same S3-compatible bucket names and key structure.
Supported File Types
Video: .mp4, .mkv, .avi, .mov, .webm, .flv, .wmv, .m4v
Audio: .mp3, .wav, .flac, .aac, .ogg, .m4a