76 lines
2.3 KiB
Markdown
76 lines
2.3 KiB
Markdown
# Media & Artifact Storage
|
|
|
|
## Overview
|
|
|
|
MPR stores everything on **S3-compatible** object storage. Locally that's MinIO; in any
|
|
cloud target (AWS, GCS via HMAC, Cloudflare R2, etc.) it's the provider's S3 API. The
|
|
code in `core/storage/` uses boto3 throughout — only the endpoint URL and credentials
|
|
change between environments.
|
|
|
|
## What goes where
|
|
|
|
| Bucket / prefix | Contents | Producer | Consumer |
|
|
|---|---|---|---|
|
|
| `mpr-media-in` | Source video files (chunks the user uploaded or device-recorded) | user / chunker UI | `extract_frames` stage, `core/api/detect/sources.py` |
|
|
| `mpr-media-out` | Per-job artifacts: extracted frame caches, debug overlays | pipeline stages, `core/api/detect/replay.py` overlays endpoints | UI panels (frame strip, overlay viewer) |
|
|
|
|
Both buckets live behind the same S3 client (`core/storage/`). DB rows store relative
|
|
keys (e.g. `chunks/2025-04-15/match-01.mp4`); the bucket is implicit.
|
|
|
|
## Local development (MinIO)
|
|
|
|
```bash
|
|
S3_ENDPOINT_URL=http://minio:9000
|
|
S3_BUCKET_IN=mpr-media-in
|
|
S3_BUCKET_OUT=mpr-media-out
|
|
AWS_ACCESS_KEY_ID=minioadmin
|
|
AWS_SECRET_ACCESS_KEY=minioadmin
|
|
```
|
|
|
|
In the Tilt setup, MinIO runs as a k8s Deployment with port-forwards for `9000` (S3 API)
|
|
and `9001` (web console). A `minio-init` job creates the buckets on first start.
|
|
|
|
## Cloud (AWS S3 / GCS / others)
|
|
|
|
```bash
|
|
# AWS S3 — no endpoint URL needed
|
|
S3_BUCKET_IN=...
|
|
S3_BUCKET_OUT=...
|
|
AWS_REGION=us-east-1
|
|
AWS_ACCESS_KEY_ID=...
|
|
AWS_SECRET_ACCESS_KEY=...
|
|
|
|
# GCS via HMAC
|
|
S3_ENDPOINT_URL=https://storage.googleapis.com
|
|
AWS_ACCESS_KEY_ID=<gcs hmac access>
|
|
AWS_SECRET_ACCESS_KEY=<gcs hmac secret>
|
|
```
|
|
|
|
## Database vs. object storage
|
|
|
|
Heavy artifacts (frames, masks, overlays) live in MinIO/S3. The `Checkpoint` and
|
|
`StageOutput` tables in Postgres (see `02-data-model.svg`) hold structured outputs
|
|
(detections, stats, references to S3 keys) — never blobs. Frame caches keyed by
|
|
`timeline_id` are written by the first run of `extract_frames` and reused by every
|
|
later replay on the same timeline.
|
|
|
|
## Storage module
|
|
|
|
`core/storage/` exposes the small set of helpers callers need:
|
|
|
|
```python
|
|
from core.storage import (
|
|
get_s3_client,
|
|
list_objects,
|
|
download_file,
|
|
download_to_temp,
|
|
upload_file,
|
|
get_presigned_url,
|
|
BUCKET_IN,
|
|
BUCKET_OUT,
|
|
)
|
|
```
|
|
|
|
Anything else (multipart, lifecycle, versioning) is the bucket's responsibility, not
|
|
the application's.
|