81 lines
3.4 KiB
Markdown
81 lines
3.4 KiB
Markdown
# 01 - Scene Detection Sensitivity, Image Quality, and Granular Caching
|
|
|
|
## Date
|
|
2025-10-28
|
|
|
|
## Context
|
|
Last run on zaca-run-scrapers sample (Zed editor walkthrough) only detected 19 frames with 7+ minute gaps. Whisper wasn't running (flag not passed). JPEG compression quality was poor for code/text readability.
|
|
|
|
## Problems Identified
|
|
1. **Scene detection too conservative** - Default threshold of 30.0 missed file switches and scrolling in clean UI (Zed vs VS Code)
|
|
2. **No whisper transcription** - User expected it to run but `--run-whisper` is opt-in
|
|
3. **Poor JPEG quality** - Default compression made code/text hard to read for OCR/vision
|
|
4. **Subprocess-based FFmpeg** - Using shell commands instead of Python library
|
|
5. **All-or-nothing caching** - `--no-cache` regenerates everything including slow whisper transcription
|
|
|
|
## Changes Made
|
|
|
|
### 1. Scene Detection Sensitivity
|
|
**Files:** `meetus/frame_extractor.py`, `process_meeting.py`, `meetus/workflow.py`
|
|
|
|
- Lowered default threshold: `30.0` → `15.0` (more sensitive for clean UIs)
|
|
- Added `--scene-threshold` CLI argument (0-100, lower = more sensitive)
|
|
- Added threshold to manifest for tracking
|
|
- Updated docstring with usage guidelines:
|
|
- 15.0: Good for clean UIs like Zed
|
|
- 20-30: Busy UIs like VS Code
|
|
- 5-10: Very subtle changes
|
|
|
|
### 2. JPEG Quality Improvements
|
|
**Files:** `meetus/frame_extractor.py`
|
|
|
|
- **Interval extraction**: Added `cv2.IMWRITE_JPEG_QUALITY, 95` (line 60)
|
|
- **Scene detection**: Added `-q:v 2` to FFmpeg (best quality, line 94)
|
|
|
|
### 3. Migration to ffmpeg-python
|
|
**Files:** `meetus/frame_extractor.py`, `requirements.txt`
|
|
|
|
- Replaced `subprocess.run()` with `ffmpeg-python` library
|
|
- Cleaner, more Pythonic API
|
|
- Better error handling with `ffmpeg.Error`
|
|
- Added to requirements.txt
|
|
|
|
### 4. Granular Cache Control
|
|
**Files:** `process_meeting.py`, `meetus/workflow.py`, `meetus/cache_manager.py`
|
|
|
|
Added three new flags for selective cache invalidation:
|
|
- `--skip-cache-frames`: Regenerate frames (useful when tuning scene threshold)
|
|
- `--skip-cache-whisper`: Rerun whisper transcription
|
|
- `--skip-cache-analysis`: Rerun OCR/vision analysis
|
|
|
|
**Key design:**
|
|
- `--no-cache`: Still works as before (new directory + regenerate everything)
|
|
- New flags: Reuse existing output directory but selectively invalidate caches
|
|
- Frames are cleaned up when regenerating to avoid stale data
|
|
|
|
## Typical Workflow
|
|
|
|
```bash
|
|
# First run - generate everything including whisper (expensive, once)
|
|
python process_meeting.py samples/video.mkv --run-whisper --scene-detection --use-vision
|
|
|
|
# Iterate on scene threshold without re-running whisper
|
|
python process_meeting.py samples/video.mkv --scene-detection --scene-threshold 10 --use-vision --skip-cache-frames --skip-cache-analysis
|
|
|
|
# Try even more sensitive
|
|
python process_meeting.py samples/video.mkv --scene-detection --scene-threshold 5 --use-vision --skip-cache-frames --skip-cache-analysis
|
|
```
|
|
|
|
## Notes
|
|
- Whisper is the most expensive and reliable step → always cache it during iteration
|
|
- Scene detection needs tuning per UI style (Zed vs VS Code)
|
|
- Vision analysis should regenerate when frames change
|
|
- Walking through code (file switches, scrolling) should trigger scene changes
|
|
|
|
## Files Modified
|
|
- `meetus/frame_extractor.py` - Scene threshold, quality, ffmpeg-python
|
|
- `meetus/workflow.py` - Cache flags, frame cleanup
|
|
- `meetus/cache_manager.py` - Granular cache checks
|
|
- `process_meeting.py` - CLI arguments
|
|
- `requirements.txt` - Added ffmpeg-python
|