MPR — Detection Pipeline Architecture

OVERVIEW

A guided tour of the platform — start here for narrative context before the diagrams.

What MPR is

MPR is a brand / logo / text detection pipeline for video. A user picks chunks of source material into a Timeline, then runs a Profile (pipeline topology + per-stage config) against it. The pipeline extracts frames, filters scenes, runs CV (field segmentation, edge detection) and detection (YOLO, OCR), resolves text to a session brand list, and escalates anything still unresolved to a local VLM and then to cloud VLM providers. Output is a brand timeline and per-brand stats.

Where things run

The architecture spans four boxes: the browser (Vue 3 detection-app + OpenCV WASM worker for fast CV iteration), the K8s cluster (Envoy Gateway, FastAPI, detection-ui, Postgres, Redis, MinIO — Kind in dev via Tilt), a separate GPU host on the LAN running the inference server (YOLO, OCR, local VLM), and cloud VLM providers (Anthropic, Gemini, OpenAI, Groq) for last-resort escalation. See System.

Replay loop

The system is built around iteration. Checkpoint rows form a tree of "what configs did we try at this stage" (no blobs); StageOutput is a flat upsert table holding each stage's output dict. A single stage can be re-run in place using upstream StageOutput rows, so the UI loop is "tweak config → replay one stage → look at the overlay" without rerunning the whole pipeline. Frame caches keyed by timeline_id are reused across replays.

Profiles, not overrides

Profiles live in Postgres as two JSONB blobs — pipeline (stages + edges + routing) and configs (per-stage parameters). The convention is to duplicate a profile and tweak it, not to layer overrides at the call site. Job-level config_overrides exist but are merged on top of the resolved profile in core/detect/graph/nodes.py.

Inference indirection

Every CV/ML stage takes an INFERENCE_URL argument. Empty (the dev default) runs CV in-process; set, the stage POSTs to core/gpu/server.py on the GPU host. Heavy ML deps (torch, transformers, paddleocr) live only in core/gpu/pyproject.toml — the API host doesn't need them.

API and SSE

FastAPI under /detect/* (core/api/detect/): sources, run/stop/pause/resume/step, status, replay, checkpoints, overlays, config. Pipeline events fan out through Redis to GET /detect/stream/{job_id} as SSE. Envoy keeps the SSE connection open for up to 3600s.

Codegen

Source-of-truth dataclasses live in core/schema/models/. The standalone modelgen tool emits SQLModel ORM (core/db/models.py), Pydantic schemas, TypeScript types, and Protobuf definitions. Regenerate everything with bash ctrl/generate.sh.

SYSTEM ARCHITECTURE

Browser ↔ Envoy Gateway ↔ FastAPI / detection-ui ↔ data plane (Postgres / Redis / MinIO) ↔ LAN GPU host ↔ cloud VLM providers.

Browser K8s cluster GPU host (LAN) Cloud VLM

DETECTION PIPELINE

11 named stages from core/detect/graph/nodes.py. The runner flattens the profile's PipelineConfig graph into a linear sequence and runs each stage with cancel / pause / resume / step control.

Browser / WASM-eligible GPU inference Cloud VLM

Control flow. Each stage runs inside trace_node(), emits running → done/skipped via core/detect/emit.py, and writes its result to a StageOutput row keyed by (job_id, stage_name). Between stages the runner checks three job-keyed flags: cancel (set_cancel_check), pause/resume (threading.Event), and pause-after-stage / step.

Skip flags. SKIP_VLM=1 emits skipped for escalate_vlm; SKIP_CLOUD=1 for escalate_cloud. Useful in CI and dev when you don't want to burn provider credits.

Full pipeline reference →

PROFILES & CHECKPOINTS

Profiles are the config mechanism; checkpoints + StageOutput power the replay loop.

Profile shape

One Profile row per content type (e.g. soccer_broadcast) holds two JSONB blobs:

pipeline — a PipelineConfig: stages + edges + routing rules. The runner topologically sorts the edges, falling back to stage order when no edges are defined.
configs — {stage_name: {...}} per-stage parameters: fps, thresholds, prompts, etc. Each stage parses its slice into a typed config (FrameExtractionConfig, OCRConfig, ...).

Convention: duplicate a profile and tweak it rather than patching defaults at the call site. Job-level config_overrides exist for one-off experiments but the resolved profile is the durable artifact.

Checkpoint tree

A Checkpoint row is a tree node: (parent_id, stage_name, config_overrides, stats). No blobs. Lets the UI show a branching history of "what configs did we try at this stage" without dragging frame data around.

StageOutput (flat upsert)

One row per (job_id, stage_name) holding the stage's output dict. Single-stage replay reads upstream outputs from here, so re-running match_brands with a tweaked threshold doesn't redo OCR. POST /replay-stage is the entry point.

Replay loop

The detection-app UI is the test surface: change a config, replay one stage, see the overlay rendered from the cached frame plus the new StageOutput. Frame caches keyed by timeline_id survive across replays — extract_frames only fires on the first run for a timeline.

INFERENCE TOPOLOGY

Stages can run in three places. The split is what keeps the dev box light and lets one GPU host serve the whole team.

Browser (OpenCV WASM)

Field and edge stages can run in a Web Worker via ui/detection-app/src/cv/wasmBridge.ts using OpenCV WASM directly — no TypeScript ports of the algorithms. This is the fast-iteration path for the replay loop: tweak a kernel size, rerun the stage on the cached frames, see the overlay update without touching a server.

API host (in-process)

With INFERENCE_URL="" (the dev default in ctrl/k8s/base/configmap.yaml) every CV/ML stage calls its routine in-process. Useful when there's no GPU host wired up; works for everything except heavy YOLO/VLM workloads.

GPU host (LAN)

Set INFERENCE_URL=http://gpu-host:8000 and the same stages POST to core/gpu/server.py. The GPU server exposes /detect, /ocr, /preprocess, /vlm, /detect_edges, /segment_field — each with a /debug variant that returns intermediate masks for the overlay viewer. Heavy ML deps live only in core/gpu/pyproject.toml; the API host doesn't import torch.

Cloud VLM providers

Last-resort escalation for unresolved candidates. core/detect/providers/ wraps Anthropic, Gemini, OpenAI, and Groq. Selection is per-profile config; SKIP_CLOUD=1 bypasses the stage entirely.

DATA MODEL

Tables generated by modelgen from core/schema/models/ into core/db/models.py (SQLModel).

MediaAsset — source video file with probe metadata (duration, fps, codec).
Profile — pipeline topology + per-stage config (JSONB).
Timeline — user-created selection of chunks from a source asset.
Job — one pipeline run on a timeline; parent_id chains replays into a tree.
Checkpoint — tree node of stage state, no blobs.
StageOutput — flat upsert per (job, stage), holds output JSONB and an optional checkpoint_id.
Brand — canonical name, aliases, source (ocr/local_vlm/cloud_llm/manual), airing history.

API

FastAPI under /detect/* (mounted from core/api/detect/). Through Envoy Gateway in dev the public path is /api/detect/...; /api/detect/stream/* gets an extended idle timeout for SSE.

# Sources / timelines
GET    /sources
GET    /sources/{job_id}/chunks
POST   /timeline
GET    /timeline
GET    /timeline/{id}
DELETE /timeline/{id}/cache

# Run control
POST   /run
POST   /stop/{job_id}
POST   /pause/{job_id}
POST   /resume/{job_id}
POST   /step/{job_id}
POST   /pause-after-stage/{job_id}
GET    /status/{job_id}
POST   /clear/{job_id}

# Live events
GET    /stream/{job_id}              # SSE

# Replay / checkpoints / overlays
GET    /checkpoints/{timeline_id}
GET    /checkpoints/{timeline_id}/{stage}
GET    /scenarios
POST   /replay
POST   /replay-stage
POST   /overlays
GET    /overlays/{timeline_id}/{job_id}/{stage}/{seq}

# Config
GET    /config
PUT    /config
GET    /config/profiles
GET    /config/profiles/{name}/pipeline
PUT    /config/edge-transform
GET    /config/stages
GET    /config/stages/{stage_name}

# Jobs
GET    /jobs
GET    /jobs/{id}

STORAGE

S3-compatible everywhere — MinIO locally, real S3 / GCS / R2 in cloud targets. The same boto3 code path serves both; only S3_ENDPOINT_URL and credentials change.

mpr-media-in — source video files (chunks).
mpr-media-out — per-job artifacts: extracted frame caches, debug overlays.

Heavy artifacts (frames, masks, overlays) live in object storage. Checkpoint and StageOutput rows in Postgres hold structured outputs and references to S3 keys, never blobs.

Full storage reference →

CODE GENERATION

Source-of-truth dataclasses in core/schema/models/ → typed code in four targets.

SQLModel ORM tables → core/db/models.py
Pydantic schemas (API request / response models)
TypeScript types (UI)
Protobuf definitions (gRPC stubs in core/rpc/)

# regenerate everything
bash ctrl/generate.sh

DEV ENVIRONMENT

Tilt + Kind for local dev. Routing via Envoy Gateway on port 8080 — no nginx-ingress.

The Tiltfile lives at ctrl/Tiltfile and applies the kustomize overlay ctrl/k8s/overlays/dev/. Cluster name: kind-mpr. Tilt port-forwards Envoy (8080) and MinIO (9000 API, 9001 console).

/api/detect/stream/* → FastAPI SSE (3600s idle timeout)
/api/* → FastAPI
/, /detection/* → detection-ui (with WS upgrade for Vite HMR)

# Add to /etc/hosts
127.0.0.1 mpr.local.ar k8s.mpr.local.ar

# Bring the cluster up
cd ctrl
./kind-create.sh           # one-time
tilt up                    # builds + applies + port-forwards

# UI:    http://k8s.mpr.local.ar:8080/
# API:   http://k8s.mpr.local.ar:8080/api/
# MinIO: http://localhost:9001  (console; admin / minioadmin)

# Force a UI rebuild
tilt trigger detection-ui

QUICK REFERENCE

Common commands and switches for working in MPR.

# Render SVGs from DOT files
for f in docs/architecture/*.dot; do dot -Tsvg "$f" -o "${f%.dot}.svg"; done

# Regenerate models from core/schema/models/
bash ctrl/generate.sh

# Switch inference between local and GPU host
INFERENCE_URL=                     # local (CV runs in API process)
INFERENCE_URL=http://gpu-host:8000 # remote (core/gpu/server.py)

# Skip VLM escalation paths
SKIP_VLM=1
SKIP_CLOUD=1

# Tilt
cd ctrl && tilt up
tilt trigger detection-ui

Reference docs: