Files
mitus/def/01-scene-detection-quality-caching.md
2025-10-28 05:52:31 -03:00

81 lines
3.4 KiB
Markdown

# 01 - Scene Detection Sensitivity, Image Quality, and Granular Caching
## Date
2025-10-28
## Context
Last run on zaca-run-scrapers sample (Zed editor walkthrough) only detected 19 frames with 7+ minute gaps. Whisper wasn't running (flag not passed). JPEG compression quality was poor for code/text readability.
## Problems Identified
1. **Scene detection too conservative** - Default threshold of 30.0 missed file switches and scrolling in clean UI (Zed vs VS Code)
2. **No whisper transcription** - User expected it to run but `--run-whisper` is opt-in
3. **Poor JPEG quality** - Default compression made code/text hard to read for OCR/vision
4. **Subprocess-based FFmpeg** - Using shell commands instead of Python library
5. **All-or-nothing caching** - `--no-cache` regenerates everything including slow whisper transcription
## Changes Made
### 1. Scene Detection Sensitivity
**Files:** `meetus/frame_extractor.py`, `process_meeting.py`, `meetus/workflow.py`
- Lowered default threshold: `30.0``15.0` (more sensitive for clean UIs)
- Added `--scene-threshold` CLI argument (0-100, lower = more sensitive)
- Added threshold to manifest for tracking
- Updated docstring with usage guidelines:
- 15.0: Good for clean UIs like Zed
- 20-30: Busy UIs like VS Code
- 5-10: Very subtle changes
### 2. JPEG Quality Improvements
**Files:** `meetus/frame_extractor.py`
- **Interval extraction**: Added `cv2.IMWRITE_JPEG_QUALITY, 95` (line 60)
- **Scene detection**: Added `-q:v 2` to FFmpeg (best quality, line 94)
### 3. Migration to ffmpeg-python
**Files:** `meetus/frame_extractor.py`, `requirements.txt`
- Replaced `subprocess.run()` with `ffmpeg-python` library
- Cleaner, more Pythonic API
- Better error handling with `ffmpeg.Error`
- Added to requirements.txt
### 4. Granular Cache Control
**Files:** `process_meeting.py`, `meetus/workflow.py`, `meetus/cache_manager.py`
Added three new flags for selective cache invalidation:
- `--skip-cache-frames`: Regenerate frames (useful when tuning scene threshold)
- `--skip-cache-whisper`: Rerun whisper transcription
- `--skip-cache-analysis`: Rerun OCR/vision analysis
**Key design:**
- `--no-cache`: Still works as before (new directory + regenerate everything)
- New flags: Reuse existing output directory but selectively invalidate caches
- Frames are cleaned up when regenerating to avoid stale data
## Typical Workflow
```bash
# First run - generate everything including whisper (expensive, once)
python process_meeting.py samples/video.mkv --run-whisper --scene-detection --use-vision
# Iterate on scene threshold without re-running whisper
python process_meeting.py samples/video.mkv --scene-detection --scene-threshold 10 --use-vision --skip-cache-frames --skip-cache-analysis
# Try even more sensitive
python process_meeting.py samples/video.mkv --scene-detection --scene-threshold 5 --use-vision --skip-cache-frames --skip-cache-analysis
```
## Notes
- Whisper is the most expensive and reliable step → always cache it during iteration
- Scene detection needs tuning per UI style (Zed vs VS Code)
- Vision analysis should regenerate when frames change
- Walking through code (file switches, scrolling) should trigger scene changes
## Files Modified
- `meetus/frame_extractor.py` - Scene threshold, quality, ffmpeg-python
- `meetus/workflow.py` - Cache flags, frame cleanup
- `meetus/cache_manager.py` - Granular cache checks
- `process_meeting.py` - CLI arguments
- `requirements.txt` - Added ffmpeg-python