Files
mitus/def/01-scene-detection-quality-caching.md
2025-10-28 05:52:31 -03:00

3.4 KiB

01 - Scene Detection Sensitivity, Image Quality, and Granular Caching

Date

2025-10-28

Context

Last run on zaca-run-scrapers sample (Zed editor walkthrough) only detected 19 frames with 7+ minute gaps. Whisper wasn't running (flag not passed). JPEG compression quality was poor for code/text readability.

Problems Identified

  1. Scene detection too conservative - Default threshold of 30.0 missed file switches and scrolling in clean UI (Zed vs VS Code)
  2. No whisper transcription - User expected it to run but --run-whisper is opt-in
  3. Poor JPEG quality - Default compression made code/text hard to read for OCR/vision
  4. Subprocess-based FFmpeg - Using shell commands instead of Python library
  5. All-or-nothing caching - --no-cache regenerates everything including slow whisper transcription

Changes Made

1. Scene Detection Sensitivity

Files: meetus/frame_extractor.py, process_meeting.py, meetus/workflow.py

  • Lowered default threshold: 30.015.0 (more sensitive for clean UIs)
  • Added --scene-threshold CLI argument (0-100, lower = more sensitive)
  • Added threshold to manifest for tracking
  • Updated docstring with usage guidelines:
    • 15.0: Good for clean UIs like Zed
    • 20-30: Busy UIs like VS Code
    • 5-10: Very subtle changes

2. JPEG Quality Improvements

Files: meetus/frame_extractor.py

  • Interval extraction: Added cv2.IMWRITE_JPEG_QUALITY, 95 (line 60)
  • Scene detection: Added -q:v 2 to FFmpeg (best quality, line 94)

3. Migration to ffmpeg-python

Files: meetus/frame_extractor.py, requirements.txt

  • Replaced subprocess.run() with ffmpeg-python library
  • Cleaner, more Pythonic API
  • Better error handling with ffmpeg.Error
  • Added to requirements.txt

4. Granular Cache Control

Files: process_meeting.py, meetus/workflow.py, meetus/cache_manager.py

Added three new flags for selective cache invalidation:

  • --skip-cache-frames: Regenerate frames (useful when tuning scene threshold)
  • --skip-cache-whisper: Rerun whisper transcription
  • --skip-cache-analysis: Rerun OCR/vision analysis

Key design:

  • --no-cache: Still works as before (new directory + regenerate everything)
  • New flags: Reuse existing output directory but selectively invalidate caches
  • Frames are cleaned up when regenerating to avoid stale data

Typical Workflow

# First run - generate everything including whisper (expensive, once)
python process_meeting.py samples/video.mkv --run-whisper --scene-detection --use-vision

# Iterate on scene threshold without re-running whisper
python process_meeting.py samples/video.mkv --scene-detection --scene-threshold 10 --use-vision --skip-cache-frames --skip-cache-analysis

# Try even more sensitive
python process_meeting.py samples/video.mkv --scene-detection --scene-threshold 5 --use-vision --skip-cache-frames --skip-cache-analysis

Notes

  • Whisper is the most expensive and reliable step → always cache it during iteration
  • Scene detection needs tuning per UI style (Zed vs VS Code)
  • Vision analysis should regenerate when frames change
  • Walking through code (file switches, scrolling) should trigger scene changes

Files Modified

  • meetus/frame_extractor.py - Scene threshold, quality, ffmpeg-python
  • meetus/workflow.py - Cache flags, frame cleanup
  • meetus/cache_manager.py - Granular cache checks
  • process_meeting.py - CLI arguments
  • requirements.txt - Added ffmpeg-python