# Media Storage Architecture ## Overview MPR uses **S3-compatible storage** everywhere. Locally via MinIO, in production via AWS S3. The same boto3 code and S3 keys work in both environments - the only difference is the `S3_ENDPOINT_URL` env var. ## Storage Strategy ### S3 Buckets | Bucket | Env Var | Purpose | |--------|---------|---------| | `mpr-media-in` | `S3_BUCKET_IN` | Source media files | | `mpr-media-out` | `S3_BUCKET_OUT` | Transcoded/trimmed output | ### S3 Keys as File Paths - **Database**: Stores S3 object keys (e.g., `video1.mp4`, `subfolder/video3.mp4`) - **Local dev**: MinIO serves these via S3 API on port 9000 - **AWS**: Real S3, same keys, different endpoint ### Why S3 Everywhere? 1. **Identical code paths** - no branching between local and cloud 2. **Seamless executor switching** - Celery and Lambda both use boto3 3. **Cloud-native** - ready for production without refactoring ## Local Development (MinIO) ### Configuration ```bash S3_ENDPOINT_URL=http://minio:9000 S3_BUCKET_IN=mpr-media-in S3_BUCKET_OUT=mpr-media-out AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin ``` ### How It Works - MinIO runs as a Docker container (port 9000 API, port 9001 console) - `minio-init` container creates buckets and sets public read access on startup - Nginx proxies `/media/in/` and `/media/out/` to MinIO buckets - Upload files via MinIO Console (http://localhost:9001) or `mc` CLI ### Upload Files to MinIO ```bash # Using mc CLI mc alias set local http://localhost:9000 minioadmin minioadmin mc cp video.mp4 local/mpr-media-in/ # Using aws CLI with endpoint override aws --endpoint-url http://localhost:9000 s3 cp video.mp4 s3://mpr-media-in/ ``` ## AWS Production (S3) ### Configuration ```bash # No S3_ENDPOINT_URL = uses real AWS S3 S3_BUCKET_IN=mpr-media-in S3_BUCKET_OUT=mpr-media-out AWS_REGION=us-east-1 AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= ``` ### Upload Files to S3 ```bash aws s3 cp video.mp4 s3://mpr-media-in/ aws s3 sync /local/media/ s3://mpr-media-in/ ``` ## Storage Module `core/storage.py` provides all S3 operations: ```python from core.storage import ( get_s3_client, # boto3 client (MinIO or AWS) list_objects, # List bucket contents, filter by extension download_file, # Download S3 object to local path download_to_temp, # Download to temp file (caller cleans up) upload_file, # Upload local file to S3 get_presigned_url, # Generate presigned URL BUCKET_IN, # Input bucket name BUCKET_OUT, # Output bucket name ) ``` ## API Endpoints ### Scan Media (REST) ```http POST /api/assets/scan ``` Lists objects in `S3_BUCKET_IN`, registers new media files. ### Scan Media (GraphQL) ```graphql mutation { scanMediaFolder { found registered skipped files } } ``` ## Job Flow with S3 ### Local Mode (Celery) 1. Celery task receives `source_key` and `output_key` 2. Downloads source from `S3_BUCKET_IN` to temp file 3. Runs FFmpeg locally 4. Uploads result to `S3_BUCKET_OUT` 5. Cleans up temp files ### Lambda Mode (AWS) 1. Step Functions invokes Lambda with S3 keys 2. Lambda downloads source from `S3_BUCKET_IN` to `/tmp` 3. Runs FFmpeg in container 4. Uploads result to `S3_BUCKET_OUT` 5. Calls back to API with result Both paths use the same S3 buckets and key structure. ## Supported File Types **Video:** `.mp4`, `.mkv`, `.avi`, `.mov`, `.webm`, `.flv`, `.wmv`, `.m4v` **Audio:** `.mp3`, `.wav`, `.flac`, `.aac`, `.ogg`, `.m4a`