01 - Scene Detection Sensitivity, Image Quality, and Granular Caching

Date

2025-10-28

Context

Last run on zaca-run-scrapers sample (Zed editor walkthrough) only detected 19 frames with 7+ minute gaps. Whisper wasn't running (flag not passed). JPEG compression quality was poor for code/text readability.

Problems Identified

Scene detection too conservative - Default threshold of 30.0 missed file switches and scrolling in clean UI (Zed vs VS Code)
No whisper transcription - User expected it to run but --run-whisper is opt-in
Poor JPEG quality - Default compression made code/text hard to read for OCR/vision
Subprocess-based FFmpeg - Using shell commands instead of Python library
All-or-nothing caching - --no-cache regenerates everything including slow whisper transcription

Changes Made

1. Scene Detection Sensitivity

Files: meetus/frame_extractor.py, process_meeting.py, meetus/workflow.py

Lowered default threshold: 30.0 → 15.0 (more sensitive for clean UIs)
Added --scene-threshold CLI argument (0-100, lower = more sensitive)
Added threshold to manifest for tracking
Updated docstring with usage guidelines:
- 15.0: Good for clean UIs like Zed
- 20-30: Busy UIs like VS Code
- 5-10: Very subtle changes

2. JPEG Quality Improvements

Files: meetus/frame_extractor.py

Interval extraction: Added cv2.IMWRITE_JPEG_QUALITY, 95 (line 60)
Scene detection: Added -q:v 2 to FFmpeg (best quality, line 94)

3. Migration to ffmpeg-python

Files: meetus/frame_extractor.py, requirements.txt

Replaced subprocess.run() with ffmpeg-python library
Cleaner, more Pythonic API
Better error handling with ffmpeg.Error
Added to requirements.txt

4. Granular Cache Control

Files: process_meeting.py, meetus/workflow.py, meetus/cache_manager.py

Added three new flags for selective cache invalidation:

--skip-cache-frames: Regenerate frames (useful when tuning scene threshold)
--skip-cache-whisper: Rerun whisper transcription
--skip-cache-analysis: Rerun OCR/vision analysis

Key design:

--no-cache: Still works as before (new directory + regenerate everything)
New flags: Reuse existing output directory but selectively invalidate caches
Frames are cleaned up when regenerating to avoid stale data

Typical Workflow

# First run - generate everything including whisper (expensive, once)
python process_meeting.py samples/video.mkv --run-whisper --scene-detection --use-vision

# Iterate on scene threshold without re-running whisper
python process_meeting.py samples/video.mkv --scene-detection --scene-threshold 10 --use-vision --skip-cache-frames --skip-cache-analysis

# Try even more sensitive
python process_meeting.py samples/video.mkv --scene-detection --scene-threshold 5 --use-vision --skip-cache-frames --skip-cache-analysis

Notes

Whisper is the most expensive and reliable step → always cache it during iteration
Scene detection needs tuning per UI style (Zed vs VS Code)
Vision analysis should regenerate when frames change
Walking through code (file switches, scrolling) should trigger scene changes

Files Modified

meetus/frame_extractor.py - Scene threshold, quality, ffmpeg-python
meetus/workflow.py - Cache flags, frame cleanup
meetus/cache_manager.py - Granular cache checks
process_meeting.py - CLI arguments
requirements.txt - Added ffmpeg-python

3.4 KiB Raw Blame History