# Media & Artifact Storage ## Overview MPR stores everything on **S3-compatible** object storage. Locally that's MinIO; in any cloud target (AWS, GCS via HMAC, Cloudflare R2, etc.) it's the provider's S3 API. The code in `core/storage/` uses boto3 throughout — only the endpoint URL and credentials change between environments. ## What goes where | Bucket / prefix | Contents | Producer | Consumer | |---|---|---|---| | `mpr-media-in` | Source video files (chunks the user uploaded or device-recorded) | user / chunker UI | `extract_frames` stage, `core/api/detect/sources.py` | | `mpr-media-out` | Per-job artifacts: extracted frame caches, debug overlays | pipeline stages, `core/api/detect/replay.py` overlays endpoints | UI panels (frame strip, overlay viewer) | Both buckets live behind the same S3 client (`core/storage/`). DB rows store relative keys (e.g. `chunks/2025-04-15/match-01.mp4`); the bucket is implicit. ## Local development (MinIO) ```bash S3_ENDPOINT_URL=http://minio:9000 S3_BUCKET_IN=mpr-media-in S3_BUCKET_OUT=mpr-media-out AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin ``` In the Tilt setup, MinIO runs as a k8s Deployment with port-forwards for `9000` (S3 API) and `9001` (web console). A `minio-init` job creates the buckets on first start. ## Cloud (AWS S3 / GCS / others) ```bash # AWS S3 — no endpoint URL needed S3_BUCKET_IN=... S3_BUCKET_OUT=... AWS_REGION=us-east-1 AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... # GCS via HMAC S3_ENDPOINT_URL=https://storage.googleapis.com AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= ``` ## Database vs. object storage Heavy artifacts (frames, masks, overlays) live in MinIO/S3. The `Checkpoint` and `StageOutput` tables in Postgres (see `02-data-model.svg`) hold structured outputs (detections, stats, references to S3 keys) — never blobs. Frame caches keyed by `timeline_id` are written by the first run of `extract_frames` and reused by every later replay on the same timeline. ## Storage module `core/storage/` exposes the small set of helpers callers need: ```python from core.storage import ( get_s3_client, list_objects, download_file, download_to_temp, upload_file, get_presigned_url, BUCKET_IN, BUCKET_OUT, ) ``` Anything else (multipart, lifecycle, versioning) is the bucket's responsibility, not the application's.