123 lines
3.4 KiB
Markdown
123 lines
3.4 KiB
Markdown
# Media Storage Architecture
|
|
|
|
## Overview
|
|
|
|
MPR uses **S3-compatible storage** everywhere. Locally via MinIO, in production via AWS S3. The same boto3 code and S3 keys work in both environments - the only difference is the `S3_ENDPOINT_URL` env var.
|
|
|
|
## Storage Strategy
|
|
|
|
### S3 Buckets
|
|
|
|
| Bucket | Env Var | Purpose |
|
|
|--------|---------|---------|
|
|
| `mpr-media-in` | `S3_BUCKET_IN` | Source media files |
|
|
| `mpr-media-out` | `S3_BUCKET_OUT` | Transcoded/trimmed output |
|
|
|
|
### S3 Keys as File Paths
|
|
- **Database**: Stores S3 object keys (e.g., `video1.mp4`, `subfolder/video3.mp4`)
|
|
- **Local dev**: MinIO serves these via S3 API on port 9000
|
|
- **AWS**: Real S3, same keys, different endpoint
|
|
|
|
### Why S3 Everywhere?
|
|
1. **Identical code paths** - no branching between local and cloud
|
|
2. **Seamless executor switching** - Celery and Lambda both use boto3
|
|
3. **Cloud-native** - ready for production without refactoring
|
|
|
|
## Local Development (MinIO)
|
|
|
|
### Configuration
|
|
```bash
|
|
S3_ENDPOINT_URL=http://minio:9000
|
|
S3_BUCKET_IN=mpr-media-in
|
|
S3_BUCKET_OUT=mpr-media-out
|
|
AWS_ACCESS_KEY_ID=minioadmin
|
|
AWS_SECRET_ACCESS_KEY=minioadmin
|
|
```
|
|
|
|
### How It Works
|
|
- MinIO runs as a Docker container (port 9000 API, port 9001 console)
|
|
- `minio-init` container creates buckets and sets public read access on startup
|
|
- Nginx proxies `/media/in/` and `/media/out/` to MinIO buckets
|
|
- Upload files via MinIO Console (http://localhost:9001) or `mc` CLI
|
|
|
|
### Upload Files to MinIO
|
|
```bash
|
|
# Using mc CLI
|
|
mc alias set local http://localhost:9000 minioadmin minioadmin
|
|
mc cp video.mp4 local/mpr-media-in/
|
|
|
|
# Using aws CLI with endpoint override
|
|
aws --endpoint-url http://localhost:9000 s3 cp video.mp4 s3://mpr-media-in/
|
|
```
|
|
|
|
## AWS Production (S3)
|
|
|
|
### Configuration
|
|
```bash
|
|
# No S3_ENDPOINT_URL = uses real AWS S3
|
|
S3_BUCKET_IN=mpr-media-in
|
|
S3_BUCKET_OUT=mpr-media-out
|
|
AWS_REGION=us-east-1
|
|
AWS_ACCESS_KEY_ID=<real-key>
|
|
AWS_SECRET_ACCESS_KEY=<real-secret>
|
|
```
|
|
|
|
### Upload Files to S3
|
|
```bash
|
|
aws s3 cp video.mp4 s3://mpr-media-in/
|
|
aws s3 sync /local/media/ s3://mpr-media-in/
|
|
```
|
|
|
|
## Storage Module
|
|
|
|
`core/storage.py` provides all S3 operations:
|
|
|
|
```python
|
|
from core.storage import (
|
|
get_s3_client, # boto3 client (MinIO or AWS)
|
|
list_objects, # List bucket contents, filter by extension
|
|
download_file, # Download S3 object to local path
|
|
download_to_temp, # Download to temp file (caller cleans up)
|
|
upload_file, # Upload local file to S3
|
|
get_presigned_url, # Generate presigned URL
|
|
BUCKET_IN, # Input bucket name
|
|
BUCKET_OUT, # Output bucket name
|
|
)
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
### Scan Media (REST)
|
|
```http
|
|
POST /api/assets/scan
|
|
```
|
|
Lists objects in `S3_BUCKET_IN`, registers new media files.
|
|
|
|
### Scan Media (GraphQL)
|
|
```graphql
|
|
mutation { scanMediaFolder { found registered skipped files } }
|
|
```
|
|
|
|
## Job Flow with S3
|
|
|
|
### Local Mode (Celery)
|
|
1. Celery task receives `source_key` and `output_key`
|
|
2. Downloads source from `S3_BUCKET_IN` to temp file
|
|
3. Runs FFmpeg locally
|
|
4. Uploads result to `S3_BUCKET_OUT`
|
|
5. Cleans up temp files
|
|
|
|
### Lambda Mode (AWS)
|
|
1. Step Functions invokes Lambda with S3 keys
|
|
2. Lambda downloads source from `S3_BUCKET_IN` to `/tmp`
|
|
3. Runs FFmpeg in container
|
|
4. Uploads result to `S3_BUCKET_OUT`
|
|
5. Calls back to API with result
|
|
|
|
Both paths use the same S3 buckets and key structure.
|
|
|
|
## Supported File Types
|
|
|
|
**Video:** `.mp4`, `.mkv`, `.avi`, `.mov`, `.webm`, `.flv`, `.wmv`, `.m4v`
|
|
**Audio:** `.mp3`, `.wav`, `.flac`, `.aac`, `.ogg`, `.m4a`
|