Files
mitus/def/04-fix-whisper-cache-loading.md
Mariano Gabriel 118ef04223 embed images
2025-10-28 08:02:45 -03:00

2.8 KiB

04 - Fix Whisper Cache Loading

Date

2025-10-28

Problem

Enhanced transcript was not including the audio segments from cached whisper transcripts when running without the --run-whisper flag.

Example command that failed:

python process_meeting.py samples/zaca-run-scrapers.mkv --embed-images --scene-detection --scene-threshold 10 --skip-cache-frames -v

Result: Enhanced transcript only contained embedded images, no audio segments (0 "SPEAKER" entries).

Root Cause

In workflow.py, the _run_whisper() method was checking the run_whisper flag before checking the cache:

def _run_whisper(self) -> Optional[str]:
    if not self.config.run_whisper:
        return self.config.transcript_path  # Returns None if --transcript not specified

    # Cache check NEVER REACHED if run_whisper is False
    cached = self.cache_mgr.get_whisper_cache()
    if cached:
        return str(cached)

This meant:

  • User runs command without --run-whisper
  • Method returns None immediately
  • Cached whisper transcript is never discovered
  • No audio segments in enhanced output

Solution

Reorder the logic to check cache first, regardless of flags:

def _run_whisper(self) -> Optional[str]:
    """Run Whisper transcription if requested, or use cached/provided transcript."""
    # First, check cache (regardless of run_whisper flag)
    cached = self.cache_mgr.get_whisper_cache()
    if cached:
        return str(cached)

    # If no cache and not running whisper, use provided transcript path (if any)
    if not self.config.run_whisper:
        return self.config.transcript_path

    # If no cache and run_whisper is True, run whisper transcription
    # ... rest of whisper code

New Behavior

  1. Cache is checked first (regardless of --run-whisper flag)
  2. If cached whisper exists, use it
  3. If no cache and --run-whisper not specified, use --transcript path (or None)
  4. If no cache and --run-whisper specified, run whisper

Benefits

✓ Cached whisper transcripts are always discovered and used ✓ User can iterate on frame extraction/analysis without re-running whisper ✓ Enhanced transcripts now properly include both audio + visual content ✓ Granular cache flags (--skip-cache-frames, --skip-cache-whisper) work as expected

Use Case

# First run: Generate whisper transcript + extract frames
python process_meeting.py samples/video.mkv --run-whisper --embed-images --scene-detection -v

# Second run: Iterate on scene threshold without re-running whisper
python process_meeting.py samples/video.mkv --embed-images --scene-detection --scene-threshold 10 --skip-cache-frames -v
# Now correctly includes cached whisper transcript in enhanced output!

Files Modified

  • meetus/workflow.py - Reordered logic in _run_whisper() method (lines 172-181)