Files
mediaproc/docs/media-storage.md

3.4 KiB

Media Storage Architecture

Overview

MPR uses S3-compatible storage everywhere. Locally via MinIO, in production via AWS S3. The same boto3 code and S3 keys work in both environments - the only difference is the S3_ENDPOINT_URL env var.

Storage Strategy

S3 Buckets

Bucket Env Var Purpose
mpr-media-in S3_BUCKET_IN Source media files
mpr-media-out S3_BUCKET_OUT Transcoded/trimmed output

S3 Keys as File Paths

  • Database: Stores S3 object keys (e.g., video1.mp4, subfolder/video3.mp4)
  • Local dev: MinIO serves these via S3 API on port 9000
  • AWS: Real S3, same keys, different endpoint

Why S3 Everywhere?

  1. Identical code paths - no branching between local and cloud
  2. Seamless executor switching - Celery and Lambda both use boto3
  3. Cloud-native - ready for production without refactoring

Local Development (MinIO)

Configuration

S3_ENDPOINT_URL=http://minio:9000
S3_BUCKET_IN=mpr-media-in
S3_BUCKET_OUT=mpr-media-out
AWS_ACCESS_KEY_ID=minioadmin
AWS_SECRET_ACCESS_KEY=minioadmin

How It Works

  • MinIO runs as a Docker container (port 9000 API, port 9001 console)
  • minio-init container creates buckets and sets public read access on startup
  • Nginx proxies /media/in/ and /media/out/ to MinIO buckets
  • Upload files via MinIO Console (http://localhost:9001) or mc CLI

Upload Files to MinIO

# Using mc CLI
mc alias set local http://localhost:9000 minioadmin minioadmin
mc cp video.mp4 local/mpr-media-in/

# Using aws CLI with endpoint override
aws --endpoint-url http://localhost:9000 s3 cp video.mp4 s3://mpr-media-in/

AWS Production (S3)

Configuration

# No S3_ENDPOINT_URL = uses real AWS S3
S3_BUCKET_IN=mpr-media-in
S3_BUCKET_OUT=mpr-media-out
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=<real-key>
AWS_SECRET_ACCESS_KEY=<real-secret>

Upload Files to S3

aws s3 cp video.mp4 s3://mpr-media-in/
aws s3 sync /local/media/ s3://mpr-media-in/

Storage Module

core/storage.py provides all S3 operations:

from core.storage import (
    get_s3_client,     # boto3 client (MinIO or AWS)
    list_objects,      # List bucket contents, filter by extension
    download_file,     # Download S3 object to local path
    download_to_temp,  # Download to temp file (caller cleans up)
    upload_file,       # Upload local file to S3
    get_presigned_url, # Generate presigned URL
    BUCKET_IN,         # Input bucket name
    BUCKET_OUT,        # Output bucket name
)

API Endpoints

Scan Media (REST)

POST /api/assets/scan

Lists objects in S3_BUCKET_IN, registers new media files.

Scan Media (GraphQL)

mutation { scanMediaFolder { found registered skipped files } }

Job Flow with S3

Local Mode (Celery)

  1. Celery task receives source_key and output_key
  2. Downloads source from S3_BUCKET_IN to temp file
  3. Runs FFmpeg locally
  4. Uploads result to S3_BUCKET_OUT
  5. Cleans up temp files

Lambda Mode (AWS)

  1. Step Functions invokes Lambda with S3 keys
  2. Lambda downloads source from S3_BUCKET_IN to /tmp
  3. Runs FFmpeg in container
  4. Uploads result to S3_BUCKET_OUT
  5. Calls back to API with result

Both paths use the same S3 buckets and key structure.

Supported File Types

Video: .mp4, .mkv, .avi, .mov, .webm, .flv, .wmv, .m4v Audio: .mp3, .wav, .flac, .aac, .ogg, .m4a