OVERVIEW
A guided tour of the platform — start here for narrative context before the diagrams.
What MPR is
MPR is a brand / logo / text detection pipeline for video. A user picks chunks of source material into a Timeline, then runs a Profile (pipeline topology + per-stage config) against it. The pipeline extracts frames, filters scenes, runs CV (field segmentation, edge detection) and detection (YOLO, OCR), resolves text to a session brand list, and escalates anything still unresolved to a local VLM and then to cloud VLM providers. Output is a brand timeline and per-brand stats.
Where things run
The architecture spans four boxes: the browser (Vue 3 detection-app + OpenCV WASM worker for fast CV iteration), the K8s cluster (Envoy Gateway, FastAPI, detection-ui, Postgres, Redis, MinIO — Kind in dev via Tilt), a separate GPU host on the LAN running the inference server (YOLO, OCR, local VLM), and cloud VLM providers (Anthropic, Gemini, OpenAI, Groq) for last-resort escalation. See System.
Replay loop
The system is built around iteration. Checkpoint rows form a tree of "what configs did we try at this stage" (no blobs); StageOutput is a flat upsert table holding each stage's output dict. A single stage can be re-run in place using upstream StageOutput rows, so the UI loop is "tweak config → replay one stage → look at the overlay" without rerunning the whole pipeline. Frame caches keyed by timeline_id are reused across replays.
Profiles, not overrides
Profiles live in Postgres as two JSONB blobs — pipeline (stages + edges + routing) and configs (per-stage parameters). The convention is to duplicate a profile and tweak it, not to layer overrides at the call site. Job-level config_overrides exist but are merged on top of the resolved profile in core/detect/graph/nodes.py.
Inference indirection
Every CV/ML stage takes an INFERENCE_URL argument. Empty (the dev default) runs CV in-process; set, the stage POSTs to core/gpu/server.py on the GPU host. Heavy ML deps (torch, transformers, paddleocr) live only in core/gpu/pyproject.toml — the API host doesn't need them.
API and SSE
FastAPI under /detect/* (core/api/detect/): sources, run/stop/pause/resume/step, status, replay, checkpoints, overlays, config. Pipeline events fan out through Redis to GET /detect/stream/{job_id} as SSE. Envoy keeps the SSE connection open for up to 3600s.
Codegen
Source-of-truth dataclasses live in core/schema/models/. The standalone modelgen tool emits SQLModel ORM (core/db/models.py), Pydantic schemas, TypeScript types, and Protobuf definitions. Regenerate everything with bash ctrl/generate.sh.
SYSTEM ARCHITECTURE
Browser ↔ Envoy Gateway ↔ FastAPI / detection-ui ↔ data plane (Postgres / Redis / MinIO) ↔ LAN GPU host ↔ cloud VLM providers.
DETECTION PIPELINE
11 named stages from core/detect/graph/nodes.py. The runner flattens the profile's PipelineConfig graph into a linear sequence and runs each stage with cancel / pause / resume / step control.
Control flow. Each stage runs inside trace_node(), emits running → done/skipped via core/detect/emit.py, and writes its result to a StageOutput row keyed by (job_id, stage_name). Between stages the runner checks three job-keyed flags: cancel (set_cancel_check), pause/resume (threading.Event), and pause-after-stage / step.
Skip flags. SKIP_VLM=1 emits skipped for escalate_vlm; SKIP_CLOUD=1 for escalate_cloud. Useful in CI and dev when you don't want to burn provider credits.
PROFILES & CHECKPOINTS
Profiles are the config mechanism; checkpoints + StageOutput power the replay loop.
Profile shape
One Profile row per content type (e.g. soccer_broadcast) holds two JSONB blobs:
pipeline— aPipelineConfig: stages + edges + routing rules. The runner topologically sorts the edges, falling back to stage order when no edges are defined.configs—{stage_name: {...}}per-stage parameters: fps, thresholds, prompts, etc. Each stage parses its slice into a typed config (FrameExtractionConfig,OCRConfig, ...).
Convention: duplicate a profile and tweak it rather than patching defaults at the call site. Job-level config_overrides exist for one-off experiments but the resolved profile is the durable artifact.
Checkpoint tree
A Checkpoint row is a tree node: (parent_id, stage_name, config_overrides, stats). No blobs. Lets the UI show a branching history of "what configs did we try at this stage" without dragging frame data around.
StageOutput (flat upsert)
One row per (job_id, stage_name) holding the stage's output dict. Single-stage replay reads upstream outputs from here, so re-running match_brands with a tweaked threshold doesn't redo OCR. POST /replay-stage is the entry point.
Replay loop
The detection-app UI is the test surface: change a config, replay one stage, see the overlay rendered from the cached frame plus the new StageOutput. Frame caches keyed by timeline_id survive across replays — extract_frames only fires on the first run for a timeline.
INFERENCE TOPOLOGY
Stages can run in three places. The split is what keeps the dev box light and lets one GPU host serve the whole team.
Browser (OpenCV WASM)
Field and edge stages can run in a Web Worker via ui/detection-app/src/cv/wasmBridge.ts using OpenCV WASM directly — no TypeScript ports of the algorithms. This is the fast-iteration path for the replay loop: tweak a kernel size, rerun the stage on the cached frames, see the overlay update without touching a server.
API host (in-process)
With INFERENCE_URL="" (the dev default in ctrl/k8s/base/configmap.yaml) every CV/ML stage calls its routine in-process. Useful when there's no GPU host wired up; works for everything except heavy YOLO/VLM workloads.
GPU host (LAN)
Set INFERENCE_URL=http://gpu-host:8000 and the same stages POST to core/gpu/server.py. The GPU server exposes /detect, /ocr, /preprocess, /vlm, /detect_edges, /segment_field — each with a /debug variant that returns intermediate masks for the overlay viewer. Heavy ML deps live only in core/gpu/pyproject.toml; the API host doesn't import torch.
Cloud VLM providers
Last-resort escalation for unresolved candidates. core/detect/providers/ wraps Anthropic, Gemini, OpenAI, and Groq. Selection is per-profile config; SKIP_CLOUD=1 bypasses the stage entirely.
DATA MODEL
Tables generated by modelgen from core/schema/models/ into core/db/models.py (SQLModel).
MediaAsset— source video file with probe metadata (duration, fps, codec).Profile— pipeline topology + per-stage config (JSONB).Timeline— user-created selection of chunks from a source asset.Job— one pipeline run on a timeline;parent_idchains replays into a tree.Checkpoint— tree node of stage state, no blobs.StageOutput— flat upsert per(job, stage), holds output JSONB and an optionalcheckpoint_id.Brand— canonical name, aliases, source (ocr/local_vlm/cloud_llm/manual), airing history.
API
FastAPI under /detect/* (mounted from core/api/detect/). Through Envoy Gateway in dev the public path is /api/detect/...; /api/detect/stream/* gets an extended idle timeout for SSE.
# Sources / timelines GET /sources GET /sources/{job_id}/chunks POST /timeline GET /timeline GET /timeline/{id} DELETE /timeline/{id}/cache # Run control POST /run POST /stop/{job_id} POST /pause/{job_id} POST /resume/{job_id} POST /step/{job_id} POST /pause-after-stage/{job_id} GET /status/{job_id} POST /clear/{job_id} # Live events GET /stream/{job_id} # SSE # Replay / checkpoints / overlays GET /checkpoints/{timeline_id} GET /checkpoints/{timeline_id}/{stage} GET /scenarios POST /replay POST /replay-stage POST /overlays GET /overlays/{timeline_id}/{job_id}/{stage}/{seq} # Config GET /config PUT /config GET /config/profiles GET /config/profiles/{name}/pipeline PUT /config/edge-transform GET /config/stages GET /config/stages/{stage_name} # Jobs GET /jobs GET /jobs/{id}
STORAGE
S3-compatible everywhere — MinIO locally, real S3 / GCS / R2 in cloud targets. The same boto3 code path serves both; only S3_ENDPOINT_URL and credentials change.
mpr-media-in— source video files (chunks).mpr-media-out— per-job artifacts: extracted frame caches, debug overlays.
Heavy artifacts (frames, masks, overlays) live in object storage. Checkpoint and StageOutput rows in Postgres hold structured outputs and references to S3 keys, never blobs.
CODE GENERATION
Source-of-truth dataclasses in core/schema/models/ → typed code in four targets.
- SQLModel ORM tables →
core/db/models.py - Pydantic schemas (API request / response models)
- TypeScript types (UI)
- Protobuf definitions (gRPC stubs in
core/rpc/)
# regenerate everything
bash ctrl/generate.sh
DEV ENVIRONMENT
Tilt + Kind for local dev. Routing via Envoy Gateway on port 8080 — no nginx-ingress.
The Tiltfile lives at ctrl/Tiltfile and applies the kustomize overlay ctrl/k8s/overlays/dev/. Cluster name: kind-mpr. Tilt port-forwards Envoy (8080) and MinIO (9000 API, 9001 console).
/api/detect/stream/*→ FastAPI SSE (3600s idle timeout)/api/*→ FastAPI/,/detection/*→ detection-ui (with WS upgrade for Vite HMR)
# Add to /etc/hosts 127.0.0.1 mpr.local.ar k8s.mpr.local.ar # Bring the cluster up cd ctrl ./kind-create.sh # one-time tilt up # builds + applies + port-forwards # UI: http://k8s.mpr.local.ar:8080/ # API: http://k8s.mpr.local.ar:8080/api/ # MinIO: http://localhost:9001 (console; admin / minioadmin) # Force a UI rebuild tilt trigger detection-ui
QUICK REFERENCE
Common commands and switches for working in MPR.
# Render SVGs from DOT files for f in docs/architecture/*.dot; do dot -Tsvg "$f" -o "${f%.dot}.svg"; done # Regenerate models from core/schema/models/ bash ctrl/generate.sh # Switch inference between local and GPU host INFERENCE_URL= # local (CV runs in API process) INFERENCE_URL=http://gpu-host:8000 # remote (core/gpu/server.py) # Skip VLM escalation paths SKIP_VLM=1 SKIP_CLOUD=1 # Tilt cd ctrl && tilt up tilt trigger detection-ui
Reference docs: