# Meeting Processor Extract screen content from meeting recordings and merge with Whisper/WhisperX transcripts for better AI summarization. ## Overview This tool enhances meeting transcripts by combining: - **Audio transcription** (Whisper or WhisperX with speaker diarization) - **Screen content extraction** via FFmpeg scene detection - **Frame embedding** for direct LLM analysis The result is a rich, timestamped transcript with embedded screen frames that provides full context for AI summarization. ## Installation ### 1. System Dependencies **FFmpeg** (required for scene detection and frame extraction): ```bash # Ubuntu/Debian sudo apt-get install ffmpeg # macOS brew install ffmpeg ``` ### 2. Python Dependencies ```bash pip install -r requirements.txt ``` ### 3. Whisper or WhisperX (for audio transcription) **Standard Whisper:** ```bash pip install openai-whisper ``` **WhisperX** (recommended - includes speaker diarization): ```bash pip install whisperx ``` For speaker diarization, you'll need a HuggingFace token with access to pyannote models. ## Quick Start ### Recommended Usage ```bash python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize ``` This will: 1. Run WhisperX transcription with speaker diarization 2. Extract frames at scene changes (threshold 10 = moderately sensitive) 3. Create an enhanced transcript with frame file references 4. Save everything to `output/` folder The `--embed-images` flag adds frame paths to the transcript (e.g., `Frame: frames/video_00257.jpg`), keeping the transcript small while frames stay in `frames/` folder for LLM access. ### Re-run with Cached Results Already ran it once? Re-run instantly using cached results: ```bash # Uses cached transcript and frames python process_meeting.py samples/meeting.mkv --embed-images # Skip only specific cached items python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-frames python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper # Force complete reprocessing python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --no-cache ``` ## Usage Examples ### Scene Detection Options ```bash # Default threshold (15) python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize # More sensitive (more frames, threshold: 5) python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --diarize # Less sensitive (fewer frames, threshold: 30) python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 30 --diarize ``` ### Fixed Interval Extraction (alternative to scene detection) ```bash # Every 10 seconds python process_meeting.py samples/meeting.mkv --embed-images --interval 10 --diarize # Every 3 seconds (more detailed) python process_meeting.py samples/meeting.mkv --embed-images --interval 3 --diarize ``` ### Caching Examples ```bash # First run - processes everything python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize # Iterate on scene threshold (reuse whisper transcript) python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis # Re-run whisper only python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper # Force complete reprocessing python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --no-cache ``` ### Custom output location ```bash python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --output-dir my_outputs/ ``` ### Enable verbose logging ```bash python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --verbose ``` ## Output Files Each video gets its own timestamped output directory: ``` output/ └── 20241019_143022-meeting/ ├── manifest.json # Processing configuration ├── meeting_enhanced.txt # Enhanced transcript for AI ├── meeting.json # Whisper/WhisperX transcript └── frames/ # Extracted video frames ├── frame_00001_5.00s.jpg ├── frame_00002_10.00s.jpg └── ... ``` ### Caching Behavior The tool automatically reuses the most recent output directory for the same video: - **First run**: Creates new timestamped directory (e.g., `20241019_143022-meeting/`) - **Subsequent runs**: Reuses the same directory and cached results - **Cached items**: Whisper transcript, extracted frames, analysis results **Fine-grained cache control:** - `--no-cache`: Force complete reprocessing - `--skip-cache-frames`: Re-extract frames only - `--skip-cache-whisper`: Re-run transcription only - `--skip-cache-analysis`: Re-run analysis only This allows you to iterate on scene detection thresholds without re-running Whisper! ## Workflow for Meeting Analysis ### Complete Workflow (One Command!) ```bash python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize ``` ### Typical Iterative Workflow ```bash # First run - full processing python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize # Adjust scene threshold (keeps cached whisper transcript) python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis ``` ### Example Prompt for Claude ``` Please summarize this meeting transcript. Pay special attention to: 1. Key decisions made 2. Action items 3. Technical details shown on screen 4. Any metrics or data presented [Paste enhanced transcript here] ``` ## Command Reference ``` usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--run-whisper] [--whisper-model {tiny,base,small,medium,large}] [--diarize] [--output OUTPUT] [--output-dir OUTPUT_DIR] [--interval INTERVAL] [--scene-detection] [--scene-threshold SCENE_THRESHOLD] [--embed-images] [--embed-quality EMBED_QUALITY] [--no-cache] [--skip-cache-frames] [--skip-cache-whisper] [--skip-cache-analysis] [--no-deduplicate] [--extract-only] [--format {detailed,compact}] [--verbose] video Main Options: video Path to video file --diarize Use WhisperX with speaker diarization --embed-images Add frame file references to transcript (recommended) Frame Extraction: --scene-detection Use FFmpeg scene detection (recommended) --scene-threshold Detection sensitivity 0-100 (default: 15, lower=more sensitive) --interval Extract frame every N seconds (alternative to scene detection) Caching: --no-cache Force complete reprocessing --skip-cache-frames Re-extract frames only --skip-cache-whisper Re-run transcription only --skip-cache-analysis Re-run analysis only Other: --run-whisper Run Whisper (without diarization) --whisper-model Whisper model: tiny, base, small, medium, large (default: medium) --transcript, -t Path to existing Whisper transcript (JSON or TXT) --output, -o Output file for enhanced transcript --output-dir Directory for output files (default: output/) --verbose, -v Enable verbose logging ``` ## Tips for Best Results ### Scene Detection vs Interval - **Scene detection** (`--scene-detection`): Recommended. Captures frames when content changes. More efficient. - **Interval extraction** (`--interval N`): Alternative for continuous content. Captures every N seconds. ### Scene Detection Threshold - Lower values (5-10): More sensitive, captures more frames - Default (15): Good balance for most meetings - Higher values (20-30): Less sensitive, fewer frames ### Whisper vs WhisperX - **Whisper** (`--run-whisper`): Standard transcription, fast - **WhisperX** (`--run-whisper --diarize`): Adds speaker identification, requires HuggingFace token ### Deduplication - Enabled by default - removes similar consecutive frames - Disable with `--no-deduplicate` if slides/screens change subtly ## Troubleshooting ### Frame Extraction Issues **"No frames extracted"** - Check video file is valid: `ffmpeg -i video.mkv` - Try lower scene threshold: `--scene-threshold 5` - Try interval extraction: `--interval 3` - Check disk space in output directory **Scene detection not working** - Ensure FFmpeg is installed - Falls back to interval extraction automatically - Try manual interval: `--interval 5` ### Whisper/WhisperX Issues **WhisperX diarization not working** - Ensure you have a HuggingFace token set - Token needs access to pyannote models - Fall back to standard Whisper without `--diarize` ### Cache Issues **Cache not being used** - Ensure you're using the same video filename - Check that output directory contains cached files - Use `--verbose` to see what's being cached/loaded **Want to re-run specific steps** - `--skip-cache-frames`: Re-extract frames - `--skip-cache-whisper`: Re-run transcription - `--skip-cache-analysis`: Re-run analysis - `--no-cache`: Force complete reprocessing ## Experimental Features ### OCR and Vision Analysis OCR (`--ocr-engine`) and Vision analysis (`--use-vision`) options are available but experimental. The recommended approach is to use `--embed-images` which embeds frame references directly in the transcript, letting your LLM analyze the images. ```bash # Experimental: OCR extraction python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine tesseract # Experimental: Vision model analysis python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-model llava:13b # Experimental: Hybrid OpenCV + OCR python process_meeting.py samples/meeting.mkv --run-whisper --use-hybrid ``` ## Project Structure ``` meetus/ ├── meetus/ # Main package │ ├── __init__.py │ ├── workflow.py # Processing orchestrator │ ├── output_manager.py # Output directory & manifest management │ ├── cache_manager.py # Caching logic │ ├── frame_extractor.py # Video frame extraction (FFmpeg scene detection) │ ├── vision_processor.py # Vision model analysis (experimental) │ ├── ocr_processor.py # OCR processing (experimental) │ └── transcript_merger.py # Transcript merging ├── process_meeting.py # Main CLI script ├── requirements.txt # Python dependencies ├── output/ # Timestamped output directories │ └── YYYYMMDD_HHMMSS-video/ # Auto-generated per video ├── samples/ # Sample videos (gitignored) └── README.md # This file ``` ## License For personal use.