scene detection quality and caching
This commit is contained in:
80
def/01-scene-detection-quality-caching.md
Normal file
80
def/01-scene-detection-quality-caching.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# 01 - Scene Detection Sensitivity, Image Quality, and Granular Caching
|
||||
|
||||
## Date
|
||||
2025-10-28
|
||||
|
||||
## Context
|
||||
Last run on zaca-run-scrapers sample (Zed editor walkthrough) only detected 19 frames with 7+ minute gaps. Whisper wasn't running (flag not passed). JPEG compression quality was poor for code/text readability.
|
||||
|
||||
## Problems Identified
|
||||
1. **Scene detection too conservative** - Default threshold of 30.0 missed file switches and scrolling in clean UI (Zed vs VS Code)
|
||||
2. **No whisper transcription** - User expected it to run but `--run-whisper` is opt-in
|
||||
3. **Poor JPEG quality** - Default compression made code/text hard to read for OCR/vision
|
||||
4. **Subprocess-based FFmpeg** - Using shell commands instead of Python library
|
||||
5. **All-or-nothing caching** - `--no-cache` regenerates everything including slow whisper transcription
|
||||
|
||||
## Changes Made
|
||||
|
||||
### 1. Scene Detection Sensitivity
|
||||
**Files:** `meetus/frame_extractor.py`, `process_meeting.py`, `meetus/workflow.py`
|
||||
|
||||
- Lowered default threshold: `30.0` → `15.0` (more sensitive for clean UIs)
|
||||
- Added `--scene-threshold` CLI argument (0-100, lower = more sensitive)
|
||||
- Added threshold to manifest for tracking
|
||||
- Updated docstring with usage guidelines:
|
||||
- 15.0: Good for clean UIs like Zed
|
||||
- 20-30: Busy UIs like VS Code
|
||||
- 5-10: Very subtle changes
|
||||
|
||||
### 2. JPEG Quality Improvements
|
||||
**Files:** `meetus/frame_extractor.py`
|
||||
|
||||
- **Interval extraction**: Added `cv2.IMWRITE_JPEG_QUALITY, 95` (line 60)
|
||||
- **Scene detection**: Added `-q:v 2` to FFmpeg (best quality, line 94)
|
||||
|
||||
### 3. Migration to ffmpeg-python
|
||||
**Files:** `meetus/frame_extractor.py`, `requirements.txt`
|
||||
|
||||
- Replaced `subprocess.run()` with `ffmpeg-python` library
|
||||
- Cleaner, more Pythonic API
|
||||
- Better error handling with `ffmpeg.Error`
|
||||
- Added to requirements.txt
|
||||
|
||||
### 4. Granular Cache Control
|
||||
**Files:** `process_meeting.py`, `meetus/workflow.py`, `meetus/cache_manager.py`
|
||||
|
||||
Added three new flags for selective cache invalidation:
|
||||
- `--skip-cache-frames`: Regenerate frames (useful when tuning scene threshold)
|
||||
- `--skip-cache-whisper`: Rerun whisper transcription
|
||||
- `--skip-cache-analysis`: Rerun OCR/vision analysis
|
||||
|
||||
**Key design:**
|
||||
- `--no-cache`: Still works as before (new directory + regenerate everything)
|
||||
- New flags: Reuse existing output directory but selectively invalidate caches
|
||||
- Frames are cleaned up when regenerating to avoid stale data
|
||||
|
||||
## Typical Workflow
|
||||
|
||||
```bash
|
||||
# First run - generate everything including whisper (expensive, once)
|
||||
python process_meeting.py samples/video.mkv --run-whisper --scene-detection --use-vision
|
||||
|
||||
# Iterate on scene threshold without re-running whisper
|
||||
python process_meeting.py samples/video.mkv --scene-detection --scene-threshold 10 --use-vision --skip-cache-frames --skip-cache-analysis
|
||||
|
||||
# Try even more sensitive
|
||||
python process_meeting.py samples/video.mkv --scene-detection --scene-threshold 5 --use-vision --skip-cache-frames --skip-cache-analysis
|
||||
```
|
||||
|
||||
## Notes
|
||||
- Whisper is the most expensive and reliable step → always cache it during iteration
|
||||
- Scene detection needs tuning per UI style (Zed vs VS Code)
|
||||
- Vision analysis should regenerate when frames change
|
||||
- Walking through code (file switches, scrolling) should trigger scene changes
|
||||
|
||||
## Files Modified
|
||||
- `meetus/frame_extractor.py` - Scene threshold, quality, ffmpeg-python
|
||||
- `meetus/workflow.py` - Cache flags, frame cleanup
|
||||
- `meetus/cache_manager.py` - Granular cache checks
|
||||
- `process_meeting.py` - CLI arguments
|
||||
- `requirements.txt` - Added ffmpeg-python
|
||||
Reference in New Issue
Block a user