Files
mediaproc/docs/architecture/05-detection-pipeline.md
2026-05-03 03:19:19 -03:00

6.1 KiB

Detection Pipeline — Execution Path

Overview

A pipeline run is a sequence of named stages that read and write a shared DetectState dict. Stages are defined in core/detect/stages/; the orchestrator (core/detect/graph/runner.py) flattens the profile's PipelineConfig graph into a linear order, runs each stage, and emits SSE events to the browser.

The full stage list is in core/detect/graph/nodes.py:

extract_frames → filter_scenes
              → field_segmentation → detect_edges
              → detect_objects → preprocess → run_ocr
              → match_brands → escalate_vlm → escalate_cloud
              → compile_report

See 03-detection-pipeline.svg for the graph view.

Profile

A Profile row in Postgres holds two JSONB blobs:

  • pipeline — a PipelineConfig (stages + edges + routing rules) defining topology
  • configs{stage_name: {...}} per-stage parameters (fps, thresholds, prompts, ...)

Profiles are the config mechanism: duplicate a profile and tweak it instead of patching defaults. core/detect/profile.py loads profiles by name; _load_profile() in nodes.py merges the job's config_overrides on top.

Stage runner

PipelineRunner (in core/detect/graph/runner.py) iterates the flattened stages and between each one checks three control flags (all keyed by job_id):

  • cancelset_cancel_check(job_id, fn); raises PipelineCancelled to abort
  • pause / resume — a threading.Event per job; _wait_if_paused() blocks
  • step — like resume but auto-pauses after the next stage completes
  • pause-after-stage — toggle to step through every stage

Each stage runs inside trace_node(state, name) (sets a span used by tracing) and emits runningdone (or skipped) transitions via core/detect/emit.py.

Inference: GPU-host indirection

core/detect/graph/nodes.py reads INFERENCE_URL from the environment and passes it to every CV/ML stage:

  • INFERENCE_URL="" (default in dev) — stages call CV/ML routines in-process
  • INFERENCE_URL=http://gpu-host:8000 — stages POST to the GPU server (core/gpu/server.py) which exposes /detect, /ocr, /preprocess, /vlm, /detect_edges, /segment_field (each with a /debug variant that returns intermediate masks for the overlay viewer)

Memory note: dev and GPU machines are separate boxes on the same LAN; inference is a network call. Heavy ML deps (torch, transformers, paddleocr) live only in core/gpu/pyproject.toml — the API host doesn't need them.

Browser-side CV (OpenCV WASM)

Some stages (notably the field/edge stages) can run in the browser via OpenCV WASM (ui/detection-app/src/cv/wasmBridge.ts) for fast iteration without a round trip to the GPU host. The browser UI is the test surface for the "replay loop" — change a config, replay one stage, see the overlay. Browser CV uses OpenCV WASM directly; there are no TypeScript ports of the algorithms.

Cloud VLM escalation

escalate_vlm (local VLM on GPU host) and escalate_cloud (Anthropic / Gemini / OpenAI / Groq via core/detect/providers/) are the last-resort branches for unresolved candidates from match_brands. Skip flags:

  • SKIP_VLM=1 — emits skipped for escalate_vlm
  • SKIP_CLOUD=1 — emits skipped for escalate_cloud

Checkpoints, StageOutput, and replay

Two tables back the replay loop:

  • Checkpoint (core/db/models.py:Checkpoint) — a tree node: (parent_id, stage_name, config_overrides, stats). No blobs. Lets the UI show a branching history of "what configs did we try at this stage?"
  • StageOutput — a flat upsert table keyed by (job_id, stage_name) holding the stage's output dict. replay-stage reads upstream outputs from here so a single stage can be re-run without rerunning the whole pipeline.

API surface (core/api/detect/replay.py):

  • GET /checkpoints/{timeline_id} — full tree
  • POST /replay — clone a checkpoint into a new job, run from a chosen stage
  • POST /replay-stage — re-run one stage in place using upstream StageOutput rows
  • GET /overlays/{timeline_id}/{job_id}/{stage}/{seq} — debug overlays from MinIO

Event flow (SSE)

Stages call emit.transition(...) / emit.log(...) / emit.boxes(...) etc. (core/detect/emit.py). These push into Redis (core/detect/events.py). The SSE endpoint GET /detect/stream/{job_id} (core/api/detect/sse.py) drains the Redis list and writes to the open SSE response. Envoy keeps the connection open for up to 3600s (see ctrl/k8s/base/envoy.yaml).

stage code
  → emit.* (core/detect/emit.py)
  → push_detect_event → Redis RPUSH
  → [poll] /detect/stream/{job_id} → SSE chunk
  → fetch ReadableStream in detection-app
  → Pinia store update → Vue panel re-render

Pipeline control endpoints

All under core/api/detect/run.py:

  • POST /run — start a job from a timeline + profile
  • POST /stop/{job_id} — cancel
  • POST /pause/{job_id} / POST /resume/{job_id}
  • POST /step/{job_id} — run one stage and pause
  • POST /pause-after-stage/{job_id} — toggle pause-after-each-stage
  • GET /status/{job_id} — current stage, progress
  • POST /clear/{job_id} — discard runtime state

Where the chunker UI fits

ui/chunker/ is a standalone testing utility for the source-chunking step (split a long source video into chunks the user picks for a Timeline). It is not a pipeline stage and is not part of the detection flow. The detection pipeline reads already-chunked sources from MinIO via core/api/detect/sources.py.

Files

Concern File
Stage list core/detect/graph/nodes.py
Runner (cancel/pause/resume) core/detect/graph/runner.py
Profile loading core/detect/profile.py
Event emission core/detect/emit.py, core/detect/events.py
SSE endpoint core/api/detect/sse.py
Replay API core/api/detect/replay.py
Checkpoint storage core/detect/checkpoint/storage.py
GPU server core/gpu/server.py
Browser CV bridge ui/detection-app/src/cv/wasmBridge.ts
Cloud VLM providers core/detect/providers/