scene detection quality and caching

This commit is contained in:
Mariano Gabriel
2025-10-28 05:52:31 -03:00
parent c871af2def
commit b1e1daf278
6 changed files with 169 additions and 30 deletions

View File

@@ -0,0 +1,80 @@
# 01 - Scene Detection Sensitivity, Image Quality, and Granular Caching
## Date
2025-10-28
## Context
Last run on zaca-run-scrapers sample (Zed editor walkthrough) only detected 19 frames with 7+ minute gaps. Whisper wasn't running (flag not passed). JPEG compression quality was poor for code/text readability.
## Problems Identified
1. **Scene detection too conservative** - Default threshold of 30.0 missed file switches and scrolling in clean UI (Zed vs VS Code)
2. **No whisper transcription** - User expected it to run but `--run-whisper` is opt-in
3. **Poor JPEG quality** - Default compression made code/text hard to read for OCR/vision
4. **Subprocess-based FFmpeg** - Using shell commands instead of Python library
5. **All-or-nothing caching** - `--no-cache` regenerates everything including slow whisper transcription
## Changes Made
### 1. Scene Detection Sensitivity
**Files:** `meetus/frame_extractor.py`, `process_meeting.py`, `meetus/workflow.py`
- Lowered default threshold: `30.0``15.0` (more sensitive for clean UIs)
- Added `--scene-threshold` CLI argument (0-100, lower = more sensitive)
- Added threshold to manifest for tracking
- Updated docstring with usage guidelines:
- 15.0: Good for clean UIs like Zed
- 20-30: Busy UIs like VS Code
- 5-10: Very subtle changes
### 2. JPEG Quality Improvements
**Files:** `meetus/frame_extractor.py`
- **Interval extraction**: Added `cv2.IMWRITE_JPEG_QUALITY, 95` (line 60)
- **Scene detection**: Added `-q:v 2` to FFmpeg (best quality, line 94)
### 3. Migration to ffmpeg-python
**Files:** `meetus/frame_extractor.py`, `requirements.txt`
- Replaced `subprocess.run()` with `ffmpeg-python` library
- Cleaner, more Pythonic API
- Better error handling with `ffmpeg.Error`
- Added to requirements.txt
### 4. Granular Cache Control
**Files:** `process_meeting.py`, `meetus/workflow.py`, `meetus/cache_manager.py`
Added three new flags for selective cache invalidation:
- `--skip-cache-frames`: Regenerate frames (useful when tuning scene threshold)
- `--skip-cache-whisper`: Rerun whisper transcription
- `--skip-cache-analysis`: Rerun OCR/vision analysis
**Key design:**
- `--no-cache`: Still works as before (new directory + regenerate everything)
- New flags: Reuse existing output directory but selectively invalidate caches
- Frames are cleaned up when regenerating to avoid stale data
## Typical Workflow
```bash
# First run - generate everything including whisper (expensive, once)
python process_meeting.py samples/video.mkv --run-whisper --scene-detection --use-vision
# Iterate on scene threshold without re-running whisper
python process_meeting.py samples/video.mkv --scene-detection --scene-threshold 10 --use-vision --skip-cache-frames --skip-cache-analysis
# Try even more sensitive
python process_meeting.py samples/video.mkv --scene-detection --scene-threshold 5 --use-vision --skip-cache-frames --skip-cache-analysis
```
## Notes
- Whisper is the most expensive and reliable step → always cache it during iteration
- Scene detection needs tuning per UI style (Zed vs VS Code)
- Vision analysis should regenerate when frames change
- Walking through code (file switches, scrolling) should trigger scene changes
## Files Modified
- `meetus/frame_extractor.py` - Scene threshold, quality, ffmpeg-python
- `meetus/workflow.py` - Cache flags, frame cleanup
- `meetus/cache_manager.py` - Granular cache checks
- `process_meeting.py` - CLI arguments
- `requirements.txt` - Added ffmpeg-python

View File

@@ -12,7 +12,9 @@ logger = logging.getLogger(__name__)
class CacheManager: class CacheManager:
"""Manage caching of intermediate processing results.""" """Manage caching of intermediate processing results."""
def __init__(self, output_dir: Path, frames_dir: Path, video_name: str, use_cache: bool = True): def __init__(self, output_dir: Path, frames_dir: Path, video_name: str, use_cache: bool = True,
skip_cache_frames: bool = False, skip_cache_whisper: bool = False,
skip_cache_analysis: bool = False):
""" """
Initialize cache manager. Initialize cache manager.
@@ -20,12 +22,18 @@ class CacheManager:
output_dir: Output directory for cached files output_dir: Output directory for cached files
frames_dir: Directory for cached frames frames_dir: Directory for cached frames
video_name: Name of the video (stem) video_name: Name of the video (stem)
use_cache: Whether to use caching use_cache: Whether to use caching globally
skip_cache_frames: Skip cached frames specifically
skip_cache_whisper: Skip cached whisper specifically
skip_cache_analysis: Skip cached analysis specifically
""" """
self.output_dir = output_dir self.output_dir = output_dir
self.frames_dir = frames_dir self.frames_dir = frames_dir
self.video_name = video_name self.video_name = video_name
self.use_cache = use_cache self.use_cache = use_cache
self.skip_cache_frames = skip_cache_frames
self.skip_cache_whisper = skip_cache_whisper
self.skip_cache_analysis = skip_cache_analysis
def get_whisper_cache(self) -> Optional[Path]: def get_whisper_cache(self) -> Optional[Path]:
""" """
@@ -34,7 +42,7 @@ class CacheManager:
Returns: Returns:
Path to cached transcript or None Path to cached transcript or None
""" """
if not self.use_cache: if not self.use_cache or self.skip_cache_whisper:
return None return None
cache_path = self.output_dir / f"{self.video_name}.json" cache_path = self.output_dir / f"{self.video_name}.json"
@@ -51,7 +59,7 @@ class CacheManager:
Returns: Returns:
List of (frame_path, timestamp) tuples or None List of (frame_path, timestamp) tuples or None
""" """
if not self.use_cache or not self.frames_dir.exists(): if not self.use_cache or self.skip_cache_frames or not self.frames_dir.exists():
return None return None
existing_frames = list(self.frames_dir.glob("frame_*.jpg")) existing_frames = list(self.frames_dir.glob("frame_*.jpg"))
@@ -84,7 +92,7 @@ class CacheManager:
Returns: Returns:
List of analysis results or None List of analysis results or None
""" """
if not self.use_cache: if not self.use_cache or self.skip_cache_analysis:
return None return None
cache_path = self.output_dir / f"{self.video_name}_{analysis_type}.json" cache_path = self.output_dir / f"{self.video_name}_{analysis_type}.json"

View File

@@ -6,9 +6,9 @@ import cv2
import os import os
from pathlib import Path from pathlib import Path
from typing import List, Tuple, Optional from typing import List, Tuple, Optional
import subprocess
import json import json
import logging import logging
import re
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -56,7 +56,8 @@ class FrameExtractor:
frame_filename = f"frame_{saved_count:05d}_{timestamp:.2f}s.jpg" frame_filename = f"frame_{saved_count:05d}_{timestamp:.2f}s.jpg"
frame_path = self.output_dir / frame_filename frame_path = self.output_dir / frame_filename
cv2.imwrite(str(frame_path), frame) # Use high quality for text readability (95 = high quality JPEG)
cv2.imwrite(str(frame_path), frame, [cv2.IMWRITE_JPEG_QUALITY, 95])
frames_info.append((str(frame_path), timestamp)) frames_info.append((str(frame_path), timestamp))
saved_count += 1 saved_count += 1
@@ -66,41 +67,51 @@ class FrameExtractor:
logger.info(f"Extracted {saved_count} frames at {interval_seconds}s intervals") logger.info(f"Extracted {saved_count} frames at {interval_seconds}s intervals")
return frames_info return frames_info
def extract_scene_changes(self, threshold: float = 30.0) -> List[Tuple[str, float]]: def extract_scene_changes(self, threshold: float = 15.0) -> List[Tuple[str, float]]:
""" """
Extract frames only on scene changes using FFmpeg. Extract frames only on scene changes using FFmpeg.
More efficient than interval-based extraction. More efficient than interval-based extraction.
Args: Args:
threshold: Scene change detection threshold (0-100, lower = more sensitive) threshold: Scene change detection threshold (0-100, lower = more sensitive)
Default: 15.0 (good for clean UIs like Zed)
Higher values (20-30) for busy UIs like VS Code
Lower values (5-10) for very subtle changes
Returns: Returns:
List of (frame_path, timestamp) tuples List of (frame_path, timestamp) tuples
""" """
try:
import ffmpeg
except ImportError:
raise ImportError("ffmpeg-python not installed. Run: pip install ffmpeg-python")
video_name = Path(self.video_path).stem video_name = Path(self.video_path).stem
output_pattern = self.output_dir / f"{video_name}_%05d.jpg" output_pattern = self.output_dir / f"{video_name}_%05d.jpg"
# Use FFmpeg's scene detection filter
cmd = [
'ffmpeg',
'-i', self.video_path,
'-vf', f'select=gt(scene\\,{threshold/100}),showinfo',
'-vsync', 'vfr',
'-frame_pts', '1',
str(output_pattern),
'-loglevel', 'info'
]
try: try:
result = subprocess.run(cmd, capture_output=True, text=True, check=True) # Use FFmpeg's scene detection filter with high quality output
stream = ffmpeg.input(self.video_path)
stream = ffmpeg.filter(stream, 'select', f'gt(scene,{threshold/100})')
stream = ffmpeg.filter(stream, 'showinfo')
stream = ffmpeg.output(
stream,
str(output_pattern),
vsync='vfr',
frame_pts=1,
**{'q:v': '2'} # High quality JPEG
)
# Run with stderr capture to get showinfo output
_, stderr = ffmpeg.run(stream, capture_stderr=True, overwrite_output=True)
stderr = stderr.decode('utf-8')
# Parse FFmpeg output to get frame timestamps from showinfo filter # Parse FFmpeg output to get frame timestamps from showinfo filter
import re
frames_info = [] frames_info = []
# Extract timestamps from stderr (showinfo outputs there) # Extract timestamps from stderr (showinfo outputs there)
timestamp_pattern = r'pts_time:([\d.]+)' timestamp_pattern = r'pts_time:([\d.]+)'
timestamps = re.findall(timestamp_pattern, result.stderr) timestamps = re.findall(timestamp_pattern, stderr)
# Match frames to timestamps # Match frames to timestamps
frame_files = sorted(self.output_dir.glob(f"{video_name}_*.jpg")) frame_files = sorted(self.output_dir.glob(f"{video_name}_*.jpg"))
@@ -113,11 +124,15 @@ class FrameExtractor:
logger.info(f"Extracted {len(frames_info)} frames at scene changes") logger.info(f"Extracted {len(frames_info)} frames at scene changes")
return frames_info return frames_info
except subprocess.CalledProcessError as e: except ffmpeg.Error as e:
logger.error(f"FFmpeg error: {e.stderr}") logger.error(f"FFmpeg error: {e.stderr.decode() if e.stderr else str(e)}")
# Fallback to interval extraction # Fallback to interval extraction
logger.warning("Falling back to interval extraction...") logger.warning("Falling back to interval extraction...")
return self.extract_by_interval() return self.extract_by_interval()
except Exception as e:
logger.error(f"Unexpected error during scene extraction: {e}")
logger.warning("Falling back to interval extraction...")
return self.extract_by_interval()
def get_video_duration(self) -> float: def get_video_duration(self) -> float:
"""Get video duration in seconds.""" """Get video duration in seconds."""

View File

@@ -31,10 +31,11 @@ class WorkflowConfig:
# Whisper options # Whisper options
self.run_whisper = kwargs.get('run_whisper', False) self.run_whisper = kwargs.get('run_whisper', False)
self.whisper_model = kwargs.get('whisper_model', 'base') self.whisper_model = kwargs.get('whisper_model', 'medium')
# Frame extraction # Frame extraction
self.scene_detection = kwargs.get('scene_detection', False) self.scene_detection = kwargs.get('scene_detection', False)
self.scene_threshold = kwargs.get('scene_threshold', 15.0)
self.interval = kwargs.get('interval', 5) self.interval = kwargs.get('interval', 5)
# Analysis options # Analysis options
@@ -46,6 +47,9 @@ class WorkflowConfig:
# Processing options # Processing options
self.no_deduplicate = kwargs.get('no_deduplicate', False) self.no_deduplicate = kwargs.get('no_deduplicate', False)
self.no_cache = kwargs.get('no_cache', False) self.no_cache = kwargs.get('no_cache', False)
self.skip_cache_frames = kwargs.get('skip_cache_frames', False)
self.skip_cache_whisper = kwargs.get('skip_cache_whisper', False)
self.skip_cache_analysis = kwargs.get('skip_cache_analysis', False)
self.extract_only = kwargs.get('extract_only', False) self.extract_only = kwargs.get('extract_only', False)
self.format = kwargs.get('format', 'detailed') self.format = kwargs.get('format', 'detailed')
@@ -58,7 +62,8 @@ class WorkflowConfig:
}, },
"frame_extraction": { "frame_extraction": {
"method": "scene_detection" if self.scene_detection else "interval", "method": "scene_detection" if self.scene_detection else "interval",
"interval_seconds": self.interval if not self.scene_detection else None "interval_seconds": self.interval if not self.scene_detection else None,
"scene_threshold": self.scene_threshold if self.scene_detection else None
}, },
"analysis": { "analysis": {
"method": "vision" if self.use_vision else "ocr", "method": "vision" if self.use_vision else "ocr",
@@ -91,7 +96,10 @@ class ProcessingWorkflow:
self.output_mgr.output_dir, self.output_mgr.output_dir,
self.output_mgr.frames_dir, self.output_mgr.frames_dir,
config.video_path.stem, config.video_path.stem,
use_cache=not config.no_cache use_cache=not config.no_cache,
skip_cache_frames=config.skip_cache_frames,
skip_cache_whisper=config.skip_cache_whisper,
skip_cache_analysis=config.skip_cache_analysis
) )
def run(self) -> Dict[str, Any]: def run(self) -> Dict[str, Any]:
@@ -206,11 +214,17 @@ class ProcessingWorkflow:
if cached_frames: if cached_frames:
return cached_frames return cached_frames
# Clean up old frames if regenerating
if self.config.skip_cache_frames and self.output_mgr.frames_dir.exists():
logger.info("Cleaning up old frames...")
for old_frame in self.output_mgr.frames_dir.glob("*.jpg"):
old_frame.unlink()
# Extract frames # Extract frames
extractor = FrameExtractor(str(self.config.video_path), str(self.output_mgr.frames_dir)) extractor = FrameExtractor(str(self.config.video_path), str(self.output_mgr.frames_dir))
if self.config.scene_detection: if self.config.scene_detection:
frames_info = extractor.extract_scene_changes() frames_info = extractor.extract_scene_changes(threshold=self.config.scene_threshold)
else: else:
frames_info = extractor.extract_by_interval(self.config.interval) frames_info = extractor.extract_by_interval(self.config.interval)

View File

@@ -72,8 +72,8 @@ Examples:
parser.add_argument( parser.add_argument(
'--whisper-model', '--whisper-model',
choices=['tiny', 'base', 'small', 'medium', 'large'], choices=['tiny', 'base', 'small', 'medium', 'large'],
help='Whisper model to use (default: base)', help='Whisper model to use (default: medium)',
default='base' default='medium'
) )
# Output options # Output options
@@ -100,6 +100,12 @@ Examples:
action='store_true', action='store_true',
help='Use scene detection instead of interval extraction' help='Use scene detection instead of interval extraction'
) )
parser.add_argument(
'--scene-threshold',
type=float,
help='Scene detection threshold (0-100, lower=more sensitive, default: 15)',
default=15.0
)
# Analysis options # Analysis options
parser.add_argument( parser.add_argument(
@@ -131,6 +137,21 @@ Examples:
action='store_true', action='store_true',
help='Disable caching - reprocess everything even if outputs exist' help='Disable caching - reprocess everything even if outputs exist'
) )
parser.add_argument(
'--skip-cache-frames',
action='store_true',
help='Skip cached frames, re-extract from video (but keep whisper/analysis cache)'
)
parser.add_argument(
'--skip-cache-whisper',
action='store_true',
help='Skip cached whisper transcript, re-run transcription (but keep frames/analysis cache)'
)
parser.add_argument(
'--skip-cache-analysis',
action='store_true',
help='Skip cached analysis, re-run OCR/vision (but keep frames/whisper cache)'
)
parser.add_argument( parser.add_argument(
'--no-deduplicate', '--no-deduplicate',
action='store_true', action='store_true',

View File

@@ -1,6 +1,7 @@
# Core dependencies # Core dependencies
opencv-python>=4.8.0 opencv-python>=4.8.0
Pillow>=10.0.0 Pillow>=10.0.0
ffmpeg-python>=0.2.0
# Vision analysis (recommended for better results) # Vision analysis (recommended for better results)
# Requires Ollama to be installed: https://ollama.ai/download # Requires Ollama to be installed: https://ollama.ai/download