Files
mitus/docs/index.html
2026-05-06 11:51:43 -03:00

582 lines
28 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Mitus — Architecture</title>
<style>
@import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600&family=JetBrains+Mono:wght@400;500&display=swap');
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
background: #1e1e2e;
color: #cdd6f4;
font-family: 'Inter', sans-serif;
line-height: 1.6;
height: 100vh;
overflow: hidden;
display: flex;
flex-direction: column;
}
header {
padding: 16px 24px;
border-bottom: 1px solid #313244;
display: flex;
align-items: baseline;
gap: 16px;
flex-shrink: 0;
}
header h1 {
font-family: 'JetBrains Mono', monospace;
font-size: 22px;
font-weight: 600;
letter-spacing: 3px;
color: #89b4fa;
}
header .subtitle {
font-size: 13px;
color: #6c7086;
letter-spacing: 1px;
text-transform: uppercase;
}
.layout {
display: flex;
flex: 1;
min-height: 0;
}
nav {
display: flex;
flex-direction: column;
width: 220px;
flex-shrink: 0;
background: #181825;
border-right: 1px solid #313244;
padding: 8px 0;
overflow-y: auto;
}
nav a {
padding: 10px 20px;
font-family: 'JetBrains Mono', monospace;
font-size: 12px;
color: #a6adc8;
text-decoration: none;
border-left: 2px solid transparent;
cursor: pointer;
transition: all 0.15s;
}
nav a:hover { color: #cdd6f4; background: #313244; }
nav a.active { color: #89b4fa; border-left-color: #89b4fa; background: #1e2d3e; }
nav .group {
font-family: 'JetBrains Mono', monospace;
font-size: 10px;
color: #585b70;
letter-spacing: 1px;
text-transform: uppercase;
padding: 16px 20px 6px;
}
main {
flex: 1;
overflow: auto;
padding: 32px 48px;
}
.graph-section {
display: none;
animation: fadeIn 0.2s ease;
}
.graph-section.active { display: block; }
@keyframes fadeIn {
from { opacity: 0; }
to { opacity: 1; }
}
.graph-section h2 {
font-family: 'JetBrains Mono', monospace;
font-size: 15px;
font-weight: 500;
color: #a6adc8;
margin-bottom: 8px;
letter-spacing: 1px;
}
.graph-section p {
font-size: 13px;
color: #6c7086;
margin-bottom: 24px;
max-width: 800px;
}
.graph-container {
background: #11111b;
border: 1px solid #313244;
padding: 24px;
overflow: auto;
}
.graph-container a { display: block; }
.graph-container img { max-width: 100%; height: auto; }
.legend {
display: flex;
gap: 24px;
margin-top: 16px;
font-size: 11px;
font-family: 'JetBrains Mono', monospace;
color: #6c7086;
}
.legend span::before {
content: '';
display: inline-block;
width: 8px;
height: 8px;
margin-right: 6px;
border-radius: 50%;
}
.legend .python::before { background: #cba6f7; }
.legend .rust::before { background: #89b4fa; }
.legend .hw::before { background: #a6e3a1; }
.legend .fs::before { background: #585b70; }
/* Repo tree */
.tree-container {
background: #11111b;
border: 1px solid #313244;
padding: 24px;
overflow: auto;
}
.repo-tree {
font-family: 'JetBrains Mono', monospace;
font-size: 13px;
line-height: 1.7;
color: #a6adc8;
}
.t-root { color: #89b4fa; font-weight: 600; font-size: 15px; }
.t-dir { color: #cdd6f4; font-weight: 500; }
.t-rust { color: #89b4fa; font-weight: 500; }
.t-py { color: #cba6f7; font-weight: 500; }
.t-comment { color: #6c7086; }
/* Prose sections */
.graph-section h3 {
font-family: 'JetBrains Mono', monospace;
font-size: 13px;
font-weight: 500;
color: #cdd6f4;
letter-spacing: 1px;
margin: 32px 0 10px;
text-transform: uppercase;
}
.prose { max-width: 820px; }
.prose p {
font-size: 14px;
color: #a6adc8;
margin-bottom: 14px;
line-height: 1.7;
}
.prose p b { color: #cdd6f4; font-weight: 600; }
.prose code {
font-family: 'JetBrains Mono', monospace;
font-size: 12px;
color: #89b4fa;
background: #181825;
padding: 1px 5px;
border-radius: 3px;
}
.prose pre {
background: #11111b;
border: 1px solid #313244;
padding: 14px 16px;
margin: 8px 0 18px;
border-radius: 4px;
overflow-x: auto;
}
.prose pre code {
background: transparent;
padding: 0;
color: #cdd6f4;
font-size: 12px;
}
.prose ul {
margin: 8px 0 16px 20px;
font-size: 14px;
color: #a6adc8;
line-height: 1.7;
}
.prose ul li { margin-bottom: 6px; }
.prose .note {
border-left: 3px solid #f9e2af;
background: #2a2a3e;
padding: 10px 14px;
margin: 12px 0 18px;
font-size: 13px;
color: #cdd6f4;
}
.cmp-table {
width: 100%;
border-collapse: collapse;
font-size: 13px;
margin: 8px 0 20px;
border: 1px solid #313244;
}
.cmp-table th {
text-align: left;
background: #181825;
color: #a6adc8;
font-family: 'JetBrains Mono', monospace;
font-size: 11px;
letter-spacing: 1px;
padding: 10px 14px;
border-bottom: 1px solid #313244;
}
.cmp-table td {
padding: 10px 14px;
color: #a6adc8;
border-bottom: 1px solid #313244;
vertical-align: top;
}
.cmp-table tr:last-child td { border-bottom: none; }
/* Mobile */
.menu-toggle {
display: none;
background: transparent;
border: 1px solid #313244;
color: #cdd6f4;
padding: 6px 10px;
font-family: 'JetBrains Mono', monospace;
font-size: 14px;
cursor: pointer;
line-height: 1;
margin-left: auto;
}
.menu-toggle:hover { background: #313244; }
.nav-backdrop {
display: none;
position: absolute;
inset: 0;
background: rgba(0, 0, 0, 0.5);
z-index: 10;
}
.layout.nav-open .nav-backdrop { display: block; }
@media (max-width: 720px) {
header { padding: 10px 12px; gap: 8px; }
header h1 { font-size: 16px; letter-spacing: 1px; }
header .subtitle { display: none; }
.menu-toggle { display: inline-block; }
.layout { position: relative; }
nav {
position: absolute;
left: 0; top: 0; bottom: 0;
width: 220px;
z-index: 20;
transform: translateX(-100%);
transition: transform 0.2s ease;
box-shadow: 2px 0 8px rgba(0, 0, 0, 0.5);
}
.layout.nav-open nav { transform: translateX(0); }
main { padding: 16px; }
.graph-section h2 { font-size: 13px; }
.prose p, .prose ul { font-size: 13px; }
}
</style>
</head>
<body>
<header>
<h1>MITUS</h1>
<span class="subtitle">Stream viewer + agent — architecture</span>
<button class="menu-toggle" onclick="toggleNav()" aria-label="Toggle navigation"></button>
</header>
<div class="layout">
<div class="nav-backdrop" onclick="toggleNav()"></div>
<nav>
<div class="group">Overview</div>
<a class="active" onclick="show('overview')">Goal &amp; walkthrough</a>
<a onclick="show('usage')">Usage</a>
<a onclick="show('system')">System</a>
<div class="group">Transports</div>
<a onclick="show('python')">Python pipeline</a>
<a onclick="show('rust_client')">Rust client</a>
<a onclick="show('rust_server')">Rust server</a>
<a onclick="show('crates')">Rust crates</a>
<div class="group">Reference</div>
<a onclick="show('repo')">Repository</a>
<a onclick="show('notes')">Design notes</a>
</nav>
<main>
<section id="overview" class="graph-section active">
<h2>GOAL &amp; WALKTHROUGH</h2>
<p>Mitus records a remote desktop, transcribes its audio, extracts scene-change frames, and exposes both to an LLM agent for ad-hoc Q&amp;A.</p>
<div class="prose">
<h3>What it is</h3>
<p>A two-machine setup: the <b>sender</b> (a Wayland desktop) captures screen + audio and ships an encoded stream to the <b>receiver</b>. The receiver records to disk, runs scene detection on the live feed to extract per-event JPEG frames, transcribes the audio, and presents the result in a GTK4 GUI. The GUI doubles as an LLM client: select a frame or transcript span, hit Enter, and an agent (Claude SDK or any OpenAI-compatible endpoint) answers using the selected media as context.</p>
<h3>Why the split</h3>
<p>Capture wants Wayland + a VAAPI-friendly GPU; analysis wants CUDA for both faster-whisper and ffmpeg scene detection. Different machines, different drivers — the network stream is the seam. The receiver also runs the GUI because the recordings are stored locally and the agent talks to large frames as files, not blobs over a wire.</p>
<h3>Two transport modes</h3>
<p>Both modes produce the <b>same on-disk session layout</b> (<code>data/&lt;session_id&gt;/stream/</code>, <code>frames/</code>, <code>audio/</code>, <code>transcript.json</code>) so the GUI doesn't care which path the bytes took. The choice is a CLI flag.</p>
<ul>
<li><b>Python (default).</b> Sender is a bash watchdog wrapping <code>ffmpeg</code> CLI. Receiver is <code>cht/stream/recorder.py</code>: an <code>ffmpeg</code> listener that writes fragmented MP4 + relays UDP to <code>mpv</code> + emits scene frames out of an <code>showinfo</code> stdout pipe. Simple, all in one process, every restart costs a few seconds.</li>
<li><b>Rust (<code>--rust</code>).</b> A standalone Rust workspace under <code>media/</code>: <code>cht-client</code> on the sender, <code>cht-server</code> on the receiver. Wire protocol is a typed <code>WirePacket</code> framing instead of raw mpegts. Scene detection still runs in Python via a Unix-socket relay from the server. Connect time drops from ~20s to ~3s; session reload from disk is 12s.</li>
</ul>
<div class="note">The <code>media/</code> directory holds the Rust transport. While both modes coexist, that name is a misnomer — a future rename is planned. For now, "Rust transport" and "<code>media/</code>" mean the same thing.</div>
<h3>What the agent sees</h3>
<p>Two reference syntaxes resolve to media when sent: <code>@F0001</code><code>@F0042</code> for frames, <code>@T0001</code><code>@T0010</code> for transcript segments. Single-word verbs <code>describe</code> and <code>answer</code> are sent verbatim — no system prompt, no boilerplate. If you want detail, you type it. The agent runner injects only the referenced frame paths and transcript text alongside the user message.</p>
</div>
</section>
<section id="usage" class="graph-section">
<h2>USAGE</h2>
<p>How to start a session — sender side, receiver side, both transports.</p>
<div class="prose">
<p>Both <code>ctrl/client.sh</code> and <code>ctrl/app.sh</code> take a transport flag — <code>--python</code> (default) or <code>--rust</code>. The <code>ctrl/</code> wrappers are the entrypoints; <code>media/ctrl/*</code> and <code>sender/stream_av.py</code> are implementation details they dispatch to.</p>
<h3>Receiver (mcrn) — GUI</h3>
<p><b>Python transport (default):</b></p>
<pre><code>./ctrl/app.sh --python</code></pre>
<p><b>Rust transport:</b></p>
<pre><code>./ctrl/server.sh # cht-server on TCP :4447 (Rust mode only)
./ctrl/app.sh --rust</code></pre>
<p>Python mode does its own TCP listening inside the GUI process — no separate server step.</p>
<h3>Sender</h3>
<p><b>Python transport:</b></p>
<pre><code>./ctrl/client.sh --python [RECEIVER_IP] [PORT] # default port 4444</code></pre>
<p>(Runs <code>sudo python3 sender/stream_av.py</code> under the hood — <code>sudo</code> is required for <code>kmsgrab</code>.)</p>
<p><b>Rust transport:</b></p>
<pre><code>./ctrl/client.sh --rust [server_addr] # default mcrndeb:4447</code></pre>
<h3>Sync</h3>
<p>Both machines share the same source tree; <code>ctrl/sync.sh</code> rsyncs from the dev host to <code>mcrndeb</code>. The receiver's filesystem is also bind-mounted at <code>~/mcrn</code> on the dev host for quick file access.</p>
<h3>Inside the GUI</h3>
<ul>
<li><b>Frames panel</b> — click to select; <code>←/→</code> navigate.</li>
<li><b>Transcript panel</b> — click to select; <code>↑/↓</code> navigate; <code>Shift</code> to extend.</li>
<li><b>Enter</b> — sends <code>answer</code> + selected refs to the agent.</li>
<li><b>Describe / Answer</b> buttons — same idea, single-word verb prepended.</li>
<li><b>Agent input</b> — type freely; <code>@F1-3</code> and <code>@T5</code> attach refs.</li>
<li><b>Esc</b> — clear selection. <b>Del</b> — clear agent output.</li>
<li><b>Ctrl+R</b> — manual segment cut.</li>
</ul>
<h3>Agent provider</h3>
<p>Resolution order in <code>cht/agent/runner.py</code>:</p>
<ul>
<li><code>GROQ_API_KEY</code> → OpenAI-compatible client against Groq.</li>
<li><code>OPENAI_API_KEY</code> → OpenAI / OpenAI-compatible.</li>
<li>(default) → Claude Code SDK using your local CC subscription.</li>
</ul>
</div>
</section>
<section id="system" class="graph-section">
<h2>SYSTEM ARCHITECTURE</h2>
<p>End-to-end view: sender capture → network → receiver record + analyse → GUI + agent. Both transports converge on the same on-disk session layout.</p>
<div class="graph-container">
<a href="viewer.html?src=graphs/system.svg"><img src="graphs/system.svg" alt="System architecture"></a>
</div>
<div class="legend">
<span class="python">Python</span>
<span class="rust">Rust</span>
<span class="hw">Hardware / external</span>
<span class="fs">Filesystem</span>
</div>
</section>
<section id="python" class="graph-section">
<h2>PYTHON PIPELINE</h2>
<p>Default mode. Bash + ffmpeg CLI on the sender; <code>StreamRecorder</code> + <code>SessionProcessor</code> in <code>cht/stream/</code> on the receiver. Scene detection rides the recorder's <code>ffmpeg</code> stdout pipe — sub-second latency, no extra process.</p>
<div class="graph-container">
<a href="viewer.html?src=graphs/python_pipeline.svg"><img src="graphs/python_pipeline.svg" alt="Python pipeline"></a>
</div>
<div class="legend">
<span class="python">Python module</span>
<span class="rust">External binary (ffmpeg)</span>
<span class="hw">Hardware / OS source</span>
<span class="fs">Filesystem output</span>
</div>
</section>
<section id="rust_client" class="graph-section">
<h2>RUST CLIENT — sender</h2>
<p><code>media/client/</code> — replaces <code>sender/stream_av.sh</code> when running with <code>--rust</code>. Two backends: subprocess (default, wraps ffmpeg CLI) and an experimental direct VAAPI capture/encoder.</p>
<div class="graph-container">
<a href="viewer.html?src=graphs/rust_client.svg"><img src="graphs/rust_client.svg" alt="Rust client pipeline"></a>
</div>
</section>
<section id="rust_server" class="graph-section">
<h2>RUST SERVER — receiver</h2>
<p><code>media/server/</code> — replaces <code>StreamRecorder</code> when running with <code>--rust</code>. TCP listener with a typed <code>WirePacket</code> framing; routes Video/Audio/Control packets to ffmpeg recording, ADTS audio, and a Unix-socket scene relay.</p>
<div class="graph-container">
<a href="viewer.html?src=graphs/rust_server.svg"><img src="graphs/rust_server.svg" alt="Rust server pipeline"></a>
</div>
</section>
<section id="crates" class="graph-section">
<h2>RUST CRATES</h2>
<p>Cargo workspace under <code>media/</code>: three crates (<code>cht-common</code>, <code>cht-client</code>, <code>cht-server</code>) and their external deps. Designed to be reusable as a standalone tool — <code>mpr</code> is expected to depend on it too.</p>
<div class="graph-container">
<a href="viewer.html?src=graphs/crates.svg"><img src="graphs/crates.svg" alt="Rust crates"></a>
</div>
</section>
<section id="repo" class="graph-section">
<h2>REPOSITORY STRUCTURE</h2>
<p>Top-level layout. Python app under <code>cht/</code>; Rust transport under <code>media/</code>; sender bash under <code>sender/</code>; ops scripts under <code>ctrl/</code>.</p>
<div class="tree-container">
<pre class="repo-tree"><span class="t-root">cht/</span>
├── <span class="t-py">cht/</span> <span class="t-comment">Python app (GTK4 GUI, recording, transcribe, agent)</span>
│ ├── app.py · window.py <span class="t-comment">entrypoint + main window</span>
│ ├── config.py · session.py <span class="t-comment">app config, session manifest</span>
│ ├── stream/ <span class="t-comment">recorder · processor · tracker · lifecycle · ffmpeg helpers</span>
│ ├── audio/ <span class="t-comment">waveform engine</span>
│ ├── transcriber/ <span class="t-comment">faster-whisper engine</span>
│ ├── scrub/ <span class="t-comment">proxy manager (scrub-mode preview)</span>
│ ├── index/ <span class="t-comment">frame index helpers</span>
│ ├── agent/ <span class="t-comment">runner · base · tools · claude_sdk_connection · openai_connection</span>
│ └── ui/ <span class="t-comment">timeline · monitor · scrub_bar · frames_panel · transcript_panel</span>
<span class="t-comment">agent_input · agent_output · markdown · keyboard · mpv · waveform</span>
├── <span class="t-rust">media/</span> <span class="t-comment">Rust transport workspace (Cargo) — to be renamed once both modes coexist</span>
│ ├── common/ <span class="t-comment">cht-common — WirePacket, ControlMessage, logging</span>
│ ├── client/ <span class="t-comment">cht-client — sender (Wayland, VAAPI)</span>
│ ├── server/ <span class="t-comment">cht-server — receiver (TCP listener, ffmpeg fan-out)</span>
│ └── ctrl/ <span class="t-comment">build.sh · client.sh · server.sh</span>
├── <span class="t-dir">sender/</span> <span class="t-comment">Python-mode sender — stream_av.sh (bash watchdog around ffmpeg CLI)</span>
├── <span class="t-dir">ctrl/</span> <span class="t-comment">app.sh · server.sh · client.sh · sync.sh · bench.py · e2e_test.sh</span>
├── <span class="t-dir">tests/</span> <span class="t-comment">pytest suites — config · ffmpeg · manager · processor · timeline · tracker</span>
├── <span class="t-dir">data/</span> <span class="t-comment">runtime — sessions, active-session pointer (gitignored)</span>
├── <span class="t-dir">logs/</span> <span class="t-comment">runtime logs (gitignored)</span>
├── <span class="t-dir">docs/</span> <span class="t-comment">this site — index.html · viewer.html · graphs/ · render.sh</span>
└── pyproject.toml · uv.lock <span class="t-comment">Python deps via uv</span></pre>
</div>
</section>
<section id="notes" class="graph-section">
<h2>DESIGN NOTES</h2>
<p>Why some non-obvious choices look the way they do.</p>
<div class="prose">
<h3>Same on-disk layout from both transports</h3>
<p>The GUI, transcript, scene index, and agent never branch on transport mode — they only read files. The recording layout is the contract; the network protocol underneath is replaceable. This is what made the Rust port feasible without rewriting the analysis side.</p>
<h3>Scene detection lives in the recorder, not the processor</h3>
<p>In Python mode, scene-change frames come straight off the recorder's <code>ffmpeg</code> stdout pipe — sub-second, single process. Polling the fragmented MP4 from a separate process would add 35 s of disk-IPC latency. In Rust mode the same property is approximated by relaying raw H.264 over <code>scene.sock</code> to a separate ffmpeg, but that relay turns out to be the source of most current scene-detection pain (see <i>The scene detection saga</i> below).</p>
<h3>Why bother with the Rust port</h3>
<p>Two measured wins drove the work: connect time dropped from ~20 s (CLI ffmpeg startup + mpegts negotiation) to ~3 s (typed handshake), and session reload from disk dropped to 12 s. The Python recorder still works fine for development; the Rust path matters when you reconnect a lot.</p>
<h3>One-word verbs, no system prompt</h3>
<p>Pressing Enter sends <code>answer</code> + selected refs verbatim. There is no system prompt and no instruction template wrapping the message. If a question needs detail, the user types it — the model sees exactly what you'd see, not a contract you'd have to debug.</p>
<h3>Subprocess backend over a custom encoder</h3>
<p>The Rust client wraps the same <code>ffmpeg</code> CLI the Python sender uses, demuxes its NUT output in-process, and ships <code>EncodedPacket</code>s. Less code to own than a direct VAAPI encode path, and it inherits ffmpeg's robustness around odd Wayland/DRM transitions. The direct VAAPI backend exists but is experimental.</p>
<h3>Sender as a watchdog, not a daemon</h3>
<p>Python-mode <code>stream_av.sh</code> is a bash loop that restarts <code>ffmpeg</code> on stall (no progress for 10 s) and restarts immediately on the DRM-plane format change that fullscreen apps trigger. Cheaper and more reliable than building stall detection into a long-lived process.</p>
<h3>Struggles — the scene detection saga</h3>
<p>Scene detection is the part of the system that has fought back the hardest. The short version: <b>scene detection wants to live in the same ffmpeg process that does the decoding</b>, and every architecture change has had to relearn that.</p>
<h3>1. The "one behind" bug and the flush trick</h3>
<p>Original Python pipeline ran scene detection as a branch of the same <code>ffmpeg</code> that records: <code>select='gt(scene,T)'</code><code>showinfo</code> → MJPEG. The MJPEG encoder + muxer holds the selected frame in its internal buffer until <i>another</i> selected frame pushes it out — so the JPEG you receive at time <i>T</i> is actually the previous scene change, not the current one. Classic "one behind".</p>
<p>Workaround: a flush trick — select extra adjacent frames after each scene change so the real frame gets pushed through immediately (<code>SCENE_FLUSH_FRAMES</code>, see <code>cht/config.py</code>, used in <code>cht/stream/ffmpeg.py</code> :: <code>receive_record_relay_and_detect</code>). Worked reliably <b>only because everything was in one ffmpeg process</b>.</p>
<h3>2. The Rust relay broke it</h3>
<p>When transport moved to Rust, the recorder split into two processes: Rust-side ffmpeg writes fMP4 + UDP, and a separate Python-side ffmpeg consumes raw H.264 from <code>scene.sock</code> for scene detection. Two new failure modes appeared:</p>
<ul>
<li><b>The flush trick stopped flushing.</b> The MJPEG encoder behaves differently in a standalone pipe-fed ffmpeg vs. as a branch of a multi-output process — adjacent extra frames no longer reliably push the previous selection through.</li>
<li><b>Decoder corruption from dropped packets.</b> The Rust relay uses <code>try_send</code> with a 100 ms socket write timeout (<code>media/server/src/session.rs</code>). On any backpressure the relay drops H.264 packets, which corrupts the downstream decoder until the next keyframe — and missed keyframes mean missed scene detections.</li>
</ul>
<h3>3. Three dead ends</h3>
<ul>
<li><b>fMP4-tip extraction.</b> Trigger on showinfo, then extract the frame from the just-written fragmented MP4. Fragments only finalize at keyframe boundaries (~2 s with GOP 30), so <code>ffprobe</code> reports stale duration and the extracted frame comes from the <i>previous</i> scene.</li>
<li><b>Single Rust ffmpeg with mixed outputs.</b> The clean fix would be one ffmpeg in Rust doing record (<code>-c:v copy</code>) + relay (<code>-c:v copy</code>) + scene detect (decode + filter). It doesn't work — ffmpeg won't mix <code>-c:v copy</code> outputs with <code>-filter_complex</code> on a pipe input under <code>-hwaccel cuda</code>.</li>
<li><b>Tighter retry intervals on the extractor.</b> Dropping retry from 1 s to 0.3 s made things <i>worse</i> — concurrent ffmpeg processes thrashing the GPU rather than completing.</li>
</ul>
<h3>4. Where it actually landed</h3>
<p>Current working approach (Rust mode): the relay-fed scene detector fires <code>showinfo</code> with a timestamp, then Python extracts the frame from the recording file at <i>that</i> timestamp, with a wall-clock offset computed from the session-dir name. Reliable frames; ~1 s latency per scene from fMP4 fragment lag plus the per-extract ffmpeg spawn (~0.5 s). It's the system limping along until the proper fix lands. See <code>def/10-scene-detect-to-rust.md</code> and <code>def/ISSUES.md</code> R1, R3 for the full record.</p>
<div class="note"><b>Lesson.</b> The flush hack is a dead end in any pipe-fed context. Don't try to make it work over relay — move scene detection back into the same process that has the decoded frames. That's the only configuration that has ever been quiet.</div>
<h3>Future work</h3>
<h4 style="font-family:'JetBrains Mono',monospace;font-size:12px;color:#a6adc8;letter-spacing:1px;margin:20px 0 6px">Near term — scene detection as a 3rd output of the Rust server's ffmpeg</h4>
<p>Spec: <code>def/10-scene-detect-to-rust.md</code>. Add a third branch to the existing ffmpeg the Rust server already runs:</p>
<ul>
<li>Output 1: <code>-c:v copy</code> → fMP4 (unchanged)</li>
<li>Output 2: <code>-c:v copy</code> → UDP relay (unchanged)</li>
<li>Output 3: CUDA decode → <code>select='gt(scene,T)'</code><code>showinfo</code> → MJPEG out a second pipe / second Unix socket</li>
</ul>
<p>This restores the single-process invariant — scene detection sees the same decoded frames as the recording branch, the flush behavior matches, no relay packet drops. Removes <code>detect_scenes_from_pipe()</code> in <code>cht/stream/ffmpeg.py</code>, the stdin-feeder thread in <code>cht/stream/processor.py</code>, and <code>scene_relay_task</code> in <code>media/server/src/session.rs</code>.</p>
<p>Adjacent improvements once that lands:</p>
<ul>
<li><b>Long-running extractor.</b> Keep one ffmpeg open and pipe seek commands rather than spawning per frame — eliminates the ~0.5 s startup hit.</li>
<li><b>PTS on the wire.</b> Have the Rust server send recording PTS alongside scene events so Python doesn't have to guess a wall-clock offset from the session-dir name (which is also why the first scene frame currently lands 710 s late in Rust mode — <code>def/ISSUES.md</code> R1).</li>
</ul>
<h4 style="font-family:'JetBrains Mono',monospace;font-size:12px;color:#a6adc8;letter-spacing:1px;margin:20px 0 6px">End goal — in-process libav filter graph</h4>
<p>Spec: <code>def/09-media-transport.md</code>. Rust server decodes via NVDEC, runs the scene filter in-process via the libav API, and writes JPEGs directly. No ffmpeg subprocess, no pipe, no relay, no extraction — scene-to-frame latency drops to near zero. The 3rd-output step above is the bridge: same single-process discipline, easier to land, and a clean rewrite target once it works.</p>
<p>Other items deferred to that broader port:</p>
<ul>
<li><b>Frame buffer / fast scrub.</b> GPU ring buffer of the last N decoded frames exposed over shared memory to the Python scrub UI — replaces the mpv proxy MJPEG hack (see <code>def/07-scrub-perf-ceiling.md</code>).</li>
<li><b>Typed control protocol.</b> The current <code>WirePacket</code> framing covers session lifecycle but not parameter changes; spec 09 sketches a control-message channel for things like live <code>scene_threshold</code> updates and reconnect-with-PTS.</li>
<li><b>Audio in the live UDP relay.</b> Rust mode currently has no audio in the live monitor (<code>def/ISSUES.md</code> R2) because the server's ffmpeg only takes video on its stdin. Resolved naturally once the server's ffmpeg also receives the audio track.</li>
</ul>
</div>
</section>
</main>
</div>
<script>
function show(id) {
document.querySelectorAll('.graph-section').forEach(s => s.classList.remove('active'));
document.querySelectorAll('nav a').forEach(a => a.classList.remove('active'));
document.getElementById(id).classList.add('active');
var navLink = document.querySelector('nav a[onclick="show(\'' + id + '\')"]');
if (navLink) navLink.classList.add('active');
document.querySelector('.layout').classList.remove('nav-open');
}
function toggleNav() {
document.querySelector('.layout').classList.toggle('nav-open');
}
</script>
</body>
</html>