shoehorning graphql, step functions and lamdas. aws deployment scripts

2026-02-06 18:25:42 -03:00
parent 013587d108
commit e642908abb
35 changed files with 2354 additions and 930 deletions
--- a/docs/media-storage.md
+++ b/docs/media-storage.md
@@ -2,147 +2,119 @@

 ## Overview

-MPR separates media into **input** and **output** paths, each independently configurable. File paths are stored **relative to their respective root** to ensure portability between local development and cloud deployments (AWS S3, etc.).
+MPR uses **S3-compatible storage** everywhere. Locally via MinIO, in production via AWS S3. The same boto3 code and S3 keys work in both environments - the only difference is the `S3_ENDPOINT_URL` env var.

 ## Storage Strategy

-### Input / Output Separation
+### S3 Buckets

-| Path | Env Var | Purpose |
-|------|---------|---------|
-| `MEDIA_IN` | `/app/media/in` | Source media files to process |
-| `MEDIA_OUT` | `/app/media/out` | Transcoded/trimmed output files |
+| Bucket | Env Var | Purpose |
+|--------|---------|---------|
+| `mpr-media-in` | `S3_BUCKET_IN` | Source media files |
+| `mpr-media-out` | `S3_BUCKET_OUT` | Transcoded/trimmed output |

-These can point to different locations or even different servers/buckets in production.
+### S3 Keys as File Paths
+- **Database**: Stores S3 object keys (e.g., `video1.mp4`, `subfolder/video3.mp4`)
+- **Local dev**: MinIO serves these via S3 API on port 9000
+- **AWS**: Real S3, same keys, different endpoint

-### File Path Storage
- **Database**: Stores only the relative path (e.g., `videos/sample.mp4`)
- **Input Root**: Configurable via `MEDIA_IN` env var
- **Output Root**: Configurable via `MEDIA_OUT` env var
- **Serving**: Base URL configurable via `MEDIA_BASE_URL` env var
+### Why S3 Everywhere?
+1. **Identical code paths** - no branching between local and cloud
+2. **Seamless executor switching** - Celery and Lambda both use boto3
+3. **Cloud-native** - ready for production without refactoring

-### Why Relative Paths?
-1. **Portability**: Same database works locally and in cloud
-2. **Flexibility**: Easy to switch between storage backends
-3. **Simplicity**: No need to update paths when migrating
-
-## Local Development
+## Local Development (MinIO)

 ### Configuration
 ```bash
-MEDIA_IN=/app/media/in
-MEDIA_OUT=/app/media/out
+S3_ENDPOINT_URL=http://minio:9000
+S3_BUCKET_IN=mpr-media-in
+S3_BUCKET_OUT=mpr-media-out
+AWS_ACCESS_KEY_ID=minioadmin
+AWS_SECRET_ACCESS_KEY=minioadmin
 ```

-### File Structure
-```
-/app/media/
-├── in/                    # Source files
-│   ├── video1.mp4
-│   ├── video2.mp4
-│   └── subfolder/
-│       └── video3.mp4
-└── out/                   # Transcoded output
-    ├── video1_h264.mp4
-    └── video2_trimmed.mp4
-```
+### How It Works
+- MinIO runs as a Docker container (port 9000 API, port 9001 console)
+- `minio-init` container creates buckets and sets public read access on startup
+- Nginx proxies `/media/in/` and `/media/out/` to MinIO buckets
+- Upload files via MinIO Console (http://localhost:9001) or `mc` CLI

-### Database Storage
-```
-# Source assets (scanned from media/in)
-filename: video1.mp4
-file_path: video1.mp4
-
-filename: video3.mp4
-file_path: subfolder/video3.mp4
-```
-
-### URL Serving
- Nginx serves input via `location /media/in { alias /app/media/in; }`
- Nginx serves output via `location /media/out { alias /app/media/out; }`
- Frontend accesses: `http://mpr.local.ar/media/in/video1.mp4`
- Video player: `<video src="/media/in/video1.mp4" />`
-
-## AWS/Cloud Deployment
-
-### S3 Configuration
+### Upload Files to MinIO
 ```bash
-# Input and output can be different buckets/paths
-MEDIA_IN=s3://source-bucket/media/
-MEDIA_OUT=s3://output-bucket/transcoded/
-MEDIA_BASE_URL=https://source-bucket.s3.amazonaws.com/media/
+# Using mc CLI
+mc alias set local http://localhost:9000 minioadmin minioadmin
+mc cp video.mp4 local/mpr-media-in/
+
+# Using aws CLI with endpoint override
+aws --endpoint-url http://localhost:9000 s3 cp video.mp4 s3://mpr-media-in/
 ```

-### S3 Structure
-```
-s3://source-bucket/media/
-├── video1.mp4
-└── subfolder/
-    └── video3.mp4
+## AWS Production (S3)

-s3://output-bucket/transcoded/
-├── video1_h264.mp4
-└── video2_trimmed.mp4
+### Configuration
+```bash
+# No S3_ENDPOINT_URL = uses real AWS S3
+S3_BUCKET_IN=mpr-media-in
+S3_BUCKET_OUT=mpr-media-out
+AWS_REGION=us-east-1
+AWS_ACCESS_KEY_ID=<real-key>
+AWS_SECRET_ACCESS_KEY=<real-secret>
 ```

-### Database Storage (Same!)
+### Upload Files to S3
+```bash
+aws s3 cp video.mp4 s3://mpr-media-in/
+aws s3 sync /local/media/ s3://mpr-media-in/
 ```
-filename: video1.mp4
-file_path: video1.mp4

-filename: video3.mp4
-file_path: subfolder/video3.mp4
+## Storage Module
+
+`core/storage.py` provides all S3 operations:
+
+```python
+from core.storage import (
+    get_s3_client,     # boto3 client (MinIO or AWS)
+    list_objects,      # List bucket contents, filter by extension
+    download_file,     # Download S3 object to local path
+    download_to_temp,  # Download to temp file (caller cleans up)
+    upload_file,       # Upload local file to S3
+    get_presigned_url, # Generate presigned URL
+    BUCKET_IN,         # Input bucket name
+    BUCKET_OUT,        # Output bucket name
+)
 ```

 ## API Endpoints

-### Scan Media Folder
+### Scan Media (REST)
 ```http
 POST /api/assets/scan
 ```
+Lists objects in `S3_BUCKET_IN`, registers new media files.

-**Behavior:**
-1. Recursively scans `MEDIA_IN` directory
-2. Finds all video/audio files (mp4, mkv, avi, mov, mp3, wav, etc.)
-3. Stores paths **relative to MEDIA_IN**
-4. Skips already-registered files (by filename)
-5. Returns summary: `{ found, registered, skipped, files }`
-
-### Create Job
-```http
-POST /api/jobs/
-Content-Type: application/json
-
-{
-  "source_asset_id": "uuid",
-  "preset_id": "uuid",
-  "trim_start": 10.0,
-  "trim_end": 30.0
-}
+### Scan Media (GraphQL)
+```graphql
+mutation { scanMediaFolder { found registered skipped files } }
 ```

-**Behavior:**
- Server sets `output_path` using `MEDIA_OUT` + generated filename
- Output goes to the output directory, not alongside source files
+## Job Flow with S3

-## Migration Guide
+### Local Mode (Celery)
+1. Celery task receives `source_key` and `output_key`
+2. Downloads source from `S3_BUCKET_IN` to temp file
+3. Runs FFmpeg locally
+4. Uploads result to `S3_BUCKET_OUT`
+5. Cleans up temp files

-### Moving from Local to S3
+### Lambda Mode (AWS)
+1. Step Functions invokes Lambda with S3 keys
+2. Lambda downloads source from `S3_BUCKET_IN` to `/tmp`
+3. Runs FFmpeg in container
+4. Uploads result to `S3_BUCKET_OUT`
+5. Calls back to API with result

-1. **Upload source files to S3:**
-   ```bash
-   aws s3 sync /app/media/in/ s3://source-bucket/media/
-   aws s3 sync /app/media/out/ s3://output-bucket/transcoded/
-   ```
-
-2. **Update environment variables:**
-   ```bash
-   MEDIA_IN=s3://source-bucket/media/
-   MEDIA_OUT=s3://output-bucket/transcoded/
-   MEDIA_BASE_URL=https://source-bucket.s3.amazonaws.com/media/
-   ```
-
-3. **Database paths remain unchanged** (already relative)
+Both paths use the same S3 buckets and key structure.

 ## Supported File Types