embed images
This commit is contained in:
78
def/04-fix-whisper-cache-loading.md
Normal file
78
def/04-fix-whisper-cache-loading.md
Normal file
@@ -0,0 +1,78 @@
|
||||
# 04 - Fix Whisper Cache Loading
|
||||
|
||||
## Date
|
||||
2025-10-28
|
||||
|
||||
## Problem
|
||||
Enhanced transcript was not including the audio segments from cached whisper transcripts when running without the `--run-whisper` flag.
|
||||
|
||||
Example command that failed:
|
||||
```bash
|
||||
python process_meeting.py samples/zaca-run-scrapers.mkv --embed-images --scene-detection --scene-threshold 10 --skip-cache-frames -v
|
||||
```
|
||||
|
||||
Result: Enhanced transcript only contained embedded images, no audio segments (0 "SPEAKER" entries).
|
||||
|
||||
## Root Cause
|
||||
In `workflow.py`, the `_run_whisper()` method was checking the `run_whisper` flag **before** checking the cache:
|
||||
|
||||
```python
|
||||
def _run_whisper(self) -> Optional[str]:
|
||||
if not self.config.run_whisper:
|
||||
return self.config.transcript_path # Returns None if --transcript not specified
|
||||
|
||||
# Cache check NEVER REACHED if run_whisper is False
|
||||
cached = self.cache_mgr.get_whisper_cache()
|
||||
if cached:
|
||||
return str(cached)
|
||||
```
|
||||
|
||||
This meant:
|
||||
- User runs command without `--run-whisper`
|
||||
- Method returns None immediately
|
||||
- Cached whisper transcript is never discovered
|
||||
- No audio segments in enhanced output
|
||||
|
||||
## Solution
|
||||
Reorder the logic to check cache **first**, regardless of flags:
|
||||
|
||||
```python
|
||||
def _run_whisper(self) -> Optional[str]:
|
||||
"""Run Whisper transcription if requested, or use cached/provided transcript."""
|
||||
# First, check cache (regardless of run_whisper flag)
|
||||
cached = self.cache_mgr.get_whisper_cache()
|
||||
if cached:
|
||||
return str(cached)
|
||||
|
||||
# If no cache and not running whisper, use provided transcript path (if any)
|
||||
if not self.config.run_whisper:
|
||||
return self.config.transcript_path
|
||||
|
||||
# If no cache and run_whisper is True, run whisper transcription
|
||||
# ... rest of whisper code
|
||||
```
|
||||
|
||||
## New Behavior
|
||||
1. Cache is checked first (regardless of `--run-whisper` flag)
|
||||
2. If cached whisper exists, use it
|
||||
3. If no cache and `--run-whisper` not specified, use `--transcript` path (or None)
|
||||
4. If no cache and `--run-whisper` specified, run whisper
|
||||
|
||||
## Benefits
|
||||
✓ Cached whisper transcripts are always discovered and used
|
||||
✓ User can iterate on frame extraction/analysis without re-running whisper
|
||||
✓ Enhanced transcripts now properly include both audio + visual content
|
||||
✓ Granular cache flags (`--skip-cache-frames`, `--skip-cache-whisper`) work as expected
|
||||
|
||||
## Use Case
|
||||
```bash
|
||||
# First run: Generate whisper transcript + extract frames
|
||||
python process_meeting.py samples/video.mkv --run-whisper --embed-images --scene-detection -v
|
||||
|
||||
# Second run: Iterate on scene threshold without re-running whisper
|
||||
python process_meeting.py samples/video.mkv --embed-images --scene-detection --scene-threshold 10 --skip-cache-frames -v
|
||||
# Now correctly includes cached whisper transcript in enhanced output!
|
||||
```
|
||||
|
||||
## Files Modified
|
||||
- `meetus/workflow.py` - Reordered logic in `_run_whisper()` method (lines 172-181)
|
||||
Reference in New Issue
Block a user