Files
mediaproc/docs/architecture/04-media-storage.md
2026-05-03 03:19:19 -03:00

2.3 KiB

Media & Artifact Storage

Overview

MPR stores everything on S3-compatible object storage. Locally that's MinIO; in any cloud target (AWS, GCS via HMAC, Cloudflare R2, etc.) it's the provider's S3 API. The code in core/storage/ uses boto3 throughout — only the endpoint URL and credentials change between environments.

What goes where

Bucket / prefix Contents Producer Consumer
mpr-media-in Source video files (chunks the user uploaded or device-recorded) user / chunker UI extract_frames stage, core/api/detect/sources.py
mpr-media-out Per-job artifacts: extracted frame caches, debug overlays pipeline stages, core/api/detect/replay.py overlays endpoints UI panels (frame strip, overlay viewer)

Both buckets live behind the same S3 client (core/storage/). DB rows store relative keys (e.g. chunks/2025-04-15/match-01.mp4); the bucket is implicit.

Local development (MinIO)

S3_ENDPOINT_URL=http://minio:9000
S3_BUCKET_IN=mpr-media-in
S3_BUCKET_OUT=mpr-media-out
AWS_ACCESS_KEY_ID=minioadmin
AWS_SECRET_ACCESS_KEY=minioadmin

In the Tilt setup, MinIO runs as a k8s Deployment with port-forwards for 9000 (S3 API) and 9001 (web console). A minio-init job creates the buckets on first start.

Cloud (AWS S3 / GCS / others)

# AWS S3 — no endpoint URL needed
S3_BUCKET_IN=...
S3_BUCKET_OUT=...
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...

# GCS via HMAC
S3_ENDPOINT_URL=https://storage.googleapis.com
AWS_ACCESS_KEY_ID=<gcs hmac access>
AWS_SECRET_ACCESS_KEY=<gcs hmac secret>

Database vs. object storage

Heavy artifacts (frames, masks, overlays) live in MinIO/S3. The Checkpoint and StageOutput tables in Postgres (see 02-data-model.svg) hold structured outputs (detections, stats, references to S3 keys) — never blobs. Frame caches keyed by timeline_id are written by the first run of extract_frames and reused by every later replay on the same timeline.

Storage module

core/storage/ exposes the small set of helpers callers need:

from core.storage import (
    get_s3_client,
    list_objects,
    download_file,
    download_to_temp,
    upload_file,
    get_presigned_url,
    BUCKET_IN,
    BUCKET_OUT,
)

Anything else (multipart, lifecycle, versioning) is the bucket's responsibility, not the application's.