Merge aws-int: Add AWS integration with GraphQL, Step Functions, and Lambda
# Conflicts: # docs/architecture/index.html
This commit is contained in:
@@ -2,147 +2,119 @@
|
||||
|
||||
## Overview
|
||||
|
||||
MPR separates media into **input** and **output** paths, each independently configurable. File paths are stored **relative to their respective root** to ensure portability between local development and cloud deployments (AWS S3, etc.).
|
||||
MPR uses **S3-compatible storage** everywhere. Locally via MinIO, in production via AWS S3. The same boto3 code and S3 keys work in both environments - the only difference is the `S3_ENDPOINT_URL` env var.
|
||||
|
||||
## Storage Strategy
|
||||
|
||||
### Input / Output Separation
|
||||
### S3 Buckets
|
||||
|
||||
| Path | Env Var | Purpose |
|
||||
|------|---------|---------|
|
||||
| `MEDIA_IN` | `/app/media/in` | Source media files to process |
|
||||
| `MEDIA_OUT` | `/app/media/out` | Transcoded/trimmed output files |
|
||||
| Bucket | Env Var | Purpose |
|
||||
|--------|---------|---------|
|
||||
| `mpr-media-in` | `S3_BUCKET_IN` | Source media files |
|
||||
| `mpr-media-out` | `S3_BUCKET_OUT` | Transcoded/trimmed output |
|
||||
|
||||
These can point to different locations or even different servers/buckets in production.
|
||||
### S3 Keys as File Paths
|
||||
- **Database**: Stores S3 object keys (e.g., `video1.mp4`, `subfolder/video3.mp4`)
|
||||
- **Local dev**: MinIO serves these via S3 API on port 9000
|
||||
- **AWS**: Real S3, same keys, different endpoint
|
||||
|
||||
### File Path Storage
|
||||
- **Database**: Stores only the relative path (e.g., `videos/sample.mp4`)
|
||||
- **Input Root**: Configurable via `MEDIA_IN` env var
|
||||
- **Output Root**: Configurable via `MEDIA_OUT` env var
|
||||
- **Serving**: Base URL configurable via `MEDIA_BASE_URL` env var
|
||||
### Why S3 Everywhere?
|
||||
1. **Identical code paths** - no branching between local and cloud
|
||||
2. **Seamless executor switching** - Celery and Lambda both use boto3
|
||||
3. **Cloud-native** - ready for production without refactoring
|
||||
|
||||
### Why Relative Paths?
|
||||
1. **Portability**: Same database works locally and in cloud
|
||||
2. **Flexibility**: Easy to switch between storage backends
|
||||
3. **Simplicity**: No need to update paths when migrating
|
||||
|
||||
## Local Development
|
||||
## Local Development (MinIO)
|
||||
|
||||
### Configuration
|
||||
```bash
|
||||
MEDIA_IN=/app/media/in
|
||||
MEDIA_OUT=/app/media/out
|
||||
S3_ENDPOINT_URL=http://minio:9000
|
||||
S3_BUCKET_IN=mpr-media-in
|
||||
S3_BUCKET_OUT=mpr-media-out
|
||||
AWS_ACCESS_KEY_ID=minioadmin
|
||||
AWS_SECRET_ACCESS_KEY=minioadmin
|
||||
```
|
||||
|
||||
### File Structure
|
||||
```
|
||||
/app/media/
|
||||
├── in/ # Source files
|
||||
│ ├── video1.mp4
|
||||
│ ├── video2.mp4
|
||||
│ └── subfolder/
|
||||
│ └── video3.mp4
|
||||
└── out/ # Transcoded output
|
||||
├── video1_h264.mp4
|
||||
└── video2_trimmed.mp4
|
||||
```
|
||||
### How It Works
|
||||
- MinIO runs as a Docker container (port 9000 API, port 9001 console)
|
||||
- `minio-init` container creates buckets and sets public read access on startup
|
||||
- Nginx proxies `/media/in/` and `/media/out/` to MinIO buckets
|
||||
- Upload files via MinIO Console (http://localhost:9001) or `mc` CLI
|
||||
|
||||
### Database Storage
|
||||
```
|
||||
# Source assets (scanned from media/in)
|
||||
filename: video1.mp4
|
||||
file_path: video1.mp4
|
||||
|
||||
filename: video3.mp4
|
||||
file_path: subfolder/video3.mp4
|
||||
```
|
||||
|
||||
### URL Serving
|
||||
- Nginx serves input via `location /media/in { alias /app/media/in; }`
|
||||
- Nginx serves output via `location /media/out { alias /app/media/out; }`
|
||||
- Frontend accesses: `http://mpr.local.ar/media/in/video1.mp4`
|
||||
- Video player: `<video src="/media/in/video1.mp4" />`
|
||||
|
||||
## AWS/Cloud Deployment
|
||||
|
||||
### S3 Configuration
|
||||
### Upload Files to MinIO
|
||||
```bash
|
||||
# Input and output can be different buckets/paths
|
||||
MEDIA_IN=s3://source-bucket/media/
|
||||
MEDIA_OUT=s3://output-bucket/transcoded/
|
||||
MEDIA_BASE_URL=https://source-bucket.s3.amazonaws.com/media/
|
||||
# Using mc CLI
|
||||
mc alias set local http://localhost:9000 minioadmin minioadmin
|
||||
mc cp video.mp4 local/mpr-media-in/
|
||||
|
||||
# Using aws CLI with endpoint override
|
||||
aws --endpoint-url http://localhost:9000 s3 cp video.mp4 s3://mpr-media-in/
|
||||
```
|
||||
|
||||
### S3 Structure
|
||||
```
|
||||
s3://source-bucket/media/
|
||||
├── video1.mp4
|
||||
└── subfolder/
|
||||
└── video3.mp4
|
||||
## AWS Production (S3)
|
||||
|
||||
s3://output-bucket/transcoded/
|
||||
├── video1_h264.mp4
|
||||
└── video2_trimmed.mp4
|
||||
### Configuration
|
||||
```bash
|
||||
# No S3_ENDPOINT_URL = uses real AWS S3
|
||||
S3_BUCKET_IN=mpr-media-in
|
||||
S3_BUCKET_OUT=mpr-media-out
|
||||
AWS_REGION=us-east-1
|
||||
AWS_ACCESS_KEY_ID=<real-key>
|
||||
AWS_SECRET_ACCESS_KEY=<real-secret>
|
||||
```
|
||||
|
||||
### Database Storage (Same!)
|
||||
### Upload Files to S3
|
||||
```bash
|
||||
aws s3 cp video.mp4 s3://mpr-media-in/
|
||||
aws s3 sync /local/media/ s3://mpr-media-in/
|
||||
```
|
||||
filename: video1.mp4
|
||||
file_path: video1.mp4
|
||||
|
||||
filename: video3.mp4
|
||||
file_path: subfolder/video3.mp4
|
||||
## Storage Module
|
||||
|
||||
`core/storage.py` provides all S3 operations:
|
||||
|
||||
```python
|
||||
from core.storage import (
|
||||
get_s3_client, # boto3 client (MinIO or AWS)
|
||||
list_objects, # List bucket contents, filter by extension
|
||||
download_file, # Download S3 object to local path
|
||||
download_to_temp, # Download to temp file (caller cleans up)
|
||||
upload_file, # Upload local file to S3
|
||||
get_presigned_url, # Generate presigned URL
|
||||
BUCKET_IN, # Input bucket name
|
||||
BUCKET_OUT, # Output bucket name
|
||||
)
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Scan Media Folder
|
||||
### Scan Media (REST)
|
||||
```http
|
||||
POST /api/assets/scan
|
||||
```
|
||||
Lists objects in `S3_BUCKET_IN`, registers new media files.
|
||||
|
||||
**Behavior:**
|
||||
1. Recursively scans `MEDIA_IN` directory
|
||||
2. Finds all video/audio files (mp4, mkv, avi, mov, mp3, wav, etc.)
|
||||
3. Stores paths **relative to MEDIA_IN**
|
||||
4. Skips already-registered files (by filename)
|
||||
5. Returns summary: `{ found, registered, skipped, files }`
|
||||
|
||||
### Create Job
|
||||
```http
|
||||
POST /api/jobs/
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"source_asset_id": "uuid",
|
||||
"preset_id": "uuid",
|
||||
"trim_start": 10.0,
|
||||
"trim_end": 30.0
|
||||
}
|
||||
### Scan Media (GraphQL)
|
||||
```graphql
|
||||
mutation { scanMediaFolder { found registered skipped files } }
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
- Server sets `output_path` using `MEDIA_OUT` + generated filename
|
||||
- Output goes to the output directory, not alongside source files
|
||||
## Job Flow with S3
|
||||
|
||||
## Migration Guide
|
||||
### Local Mode (Celery)
|
||||
1. Celery task receives `source_key` and `output_key`
|
||||
2. Downloads source from `S3_BUCKET_IN` to temp file
|
||||
3. Runs FFmpeg locally
|
||||
4. Uploads result to `S3_BUCKET_OUT`
|
||||
5. Cleans up temp files
|
||||
|
||||
### Moving from Local to S3
|
||||
### Lambda Mode (AWS)
|
||||
1. Step Functions invokes Lambda with S3 keys
|
||||
2. Lambda downloads source from `S3_BUCKET_IN` to `/tmp`
|
||||
3. Runs FFmpeg in container
|
||||
4. Uploads result to `S3_BUCKET_OUT`
|
||||
5. Calls back to API with result
|
||||
|
||||
1. **Upload source files to S3:**
|
||||
```bash
|
||||
aws s3 sync /app/media/in/ s3://source-bucket/media/
|
||||
aws s3 sync /app/media/out/ s3://output-bucket/transcoded/
|
||||
```
|
||||
|
||||
2. **Update environment variables:**
|
||||
```bash
|
||||
MEDIA_IN=s3://source-bucket/media/
|
||||
MEDIA_OUT=s3://output-bucket/transcoded/
|
||||
MEDIA_BASE_URL=https://source-bucket.s3.amazonaws.com/media/
|
||||
```
|
||||
|
||||
3. **Database paths remain unchanged** (already relative)
|
||||
Both paths use the same S3 buckets and key structure.
|
||||
|
||||
## Supported File Types
|
||||
|
||||
|
||||
Reference in New Issue
Block a user