embed images

2025-10-28 08:02:45 -03:00
parent b1e1daf278
commit 118ef04223
12 changed files with 1016 additions and 61 deletions
--- a/def/04-fix-whisper-cache-loading.md
+++ b/def/04-fix-whisper-cache-loading.md
@@ -0,0 +1,78 @@
+# 04 - Fix Whisper Cache Loading
+
+## Date
+2025-10-28
+
+## Problem
+Enhanced transcript was not including the audio segments from cached whisper transcripts when running without the `--run-whisper` flag.
+
+Example command that failed:
+```bash
+python process_meeting.py samples/zaca-run-scrapers.mkv --embed-images --scene-detection --scene-threshold 10 --skip-cache-frames -v
+```
+
+Result: Enhanced transcript only contained embedded images, no audio segments (0 "SPEAKER" entries).
+
+## Root Cause
+In `workflow.py`, the `_run_whisper()` method was checking the `run_whisper` flag **before** checking the cache:
+
+```python
+def _run_whisper(self) -> Optional[str]:
+    if not self.config.run_whisper:
+        return self.config.transcript_path  # Returns None if --transcript not specified
+
+    # Cache check NEVER REACHED if run_whisper is False
+    cached = self.cache_mgr.get_whisper_cache()
+    if cached:
+        return str(cached)
+```
+
+This meant:
+- User runs command without `--run-whisper`
+- Method returns None immediately
+- Cached whisper transcript is never discovered
+- No audio segments in enhanced output
+
+## Solution
+Reorder the logic to check cache **first**, regardless of flags:
+
+```python
+def _run_whisper(self) -> Optional[str]:
+    """Run Whisper transcription if requested, or use cached/provided transcript."""
+    # First, check cache (regardless of run_whisper flag)
+    cached = self.cache_mgr.get_whisper_cache()
+    if cached:
+        return str(cached)
+
+    # If no cache and not running whisper, use provided transcript path (if any)
+    if not self.config.run_whisper:
+        return self.config.transcript_path
+
+    # If no cache and run_whisper is True, run whisper transcription
+    # ... rest of whisper code
+```
+
+## New Behavior
+1. Cache is checked first (regardless of `--run-whisper` flag)
+2. If cached whisper exists, use it
+3. If no cache and `--run-whisper` not specified, use `--transcript` path (or None)
+4. If no cache and `--run-whisper` specified, run whisper
+
+## Benefits
+✓ Cached whisper transcripts are always discovered and used
+✓ User can iterate on frame extraction/analysis without re-running whisper
+✓ Enhanced transcripts now properly include both audio + visual content
+✓ Granular cache flags (`--skip-cache-frames`, `--skip-cache-whisper`) work as expected
+
+## Use Case
+```bash
+# First run: Generate whisper transcript + extract frames
+python process_meeting.py samples/video.mkv --run-whisper --embed-images --scene-detection -v
+
+# Second run: Iterate on scene threshold without re-running whisper
+python process_meeting.py samples/video.mkv --embed-images --scene-detection --scene-threshold 10 --skip-cache-frames -v
+# Now correctly includes cached whisper transcript in enhanced output!
+```
+
+## Files Modified
+- `meetus/workflow.py` - Reordered logic in `_run_whisper()` method (lines 172-181)