# Meeting Processor Extract screen content from meeting recordings and merge with Whisper transcripts for better AI summarization. ## Overview This tool enhances meeting transcripts by combining: - **Audio transcription** (from Whisper) - **Screen content analysis** (Vision models or OCR) ### Vision Analysis vs OCR - **Vision Models** (recommended): Uses local LLaVA model via Ollama to understand context - great for dashboards, code, consoles - **OCR**: Traditional text extraction - faster but less context-aware The result is a rich, timestamped transcript that provides full context for AI summarization. ## Installation ### 1. System Dependencies **Ollama** (required for vision analysis): ```bash # Install from https://ollama.ai/download # Then pull a vision model: ollama pull llava:13b # or for lighter model: ollama pull llava:7b ``` **FFmpeg** (for scene detection): ```bash # Ubuntu/Debian sudo apt-get install ffmpeg # macOS brew install ffmpeg ``` **Tesseract OCR** (optional, if not using vision): ```bash # Ubuntu/Debian sudo apt-get install tesseract-ocr # macOS brew install tesseract # Arch Linux sudo pacman -S tesseract ``` ### 2. Python Dependencies ```bash pip install -r requirements.txt ``` ### 3. Whisper (for audio transcription) ```bash pip install openai-whisper ``` ### 4. Optional: Install Alternative OCR Engines If you prefer OCR over vision analysis: ```bash # EasyOCR (better for rotated/handwritten text) pip install easyocr # PaddleOCR (better for code/terminal screens) pip install paddleocr ``` ## Quick Start ### Recommended: Vision Analysis (Best for Code/Dashboards) ```bash python process_meeting.py samples/meeting.mkv --run-whisper --use-vision ``` This will: 1. Run Whisper transcription (audio → text) 2. Extract frames every 5 seconds 3. Use LLaVA vision model to analyze frames with context 4. Merge audio + screen content 5. Save everything to `output/` folder ### Re-run with Cached Results Already ran it once? Re-run instantly using cached results: ```bash # Uses cached transcript, frames, and analysis python process_meeting.py samples/meeting.mkv --use-vision # Force reprocessing python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --no-cache ``` ### Traditional OCR (Faster, Less Context-Aware) ```bash python process_meeting.py samples/meeting.mkv --run-whisper ``` ## Usage Examples ### Vision Analysis with Context Hints ```bash # For code-heavy meetings python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-context code # For dashboard/monitoring meetings (Grafana, GCP, etc.) python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-context dashboard # For console/terminal sessions python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-context console ``` ### Different Vision Models ```bash # Lighter/faster model (7B parameters) python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-model llava:7b # Default model (13B parameters, better quality) python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-model llava:13b # Alternative models python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-model bakllava ``` ### Extract frames at different intervals ```bash # Every 10 seconds python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --interval 10 # Every 3 seconds (more detailed) python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --interval 3 ``` ### Use scene detection (smarter, fewer frames) ```bash python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --scene-detection ``` ### Traditional OCR (if you prefer) ```bash # Tesseract (default) python process_meeting.py samples/meeting.mkv --run-whisper # EasyOCR python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine easyocr # PaddleOCR python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine paddleocr ``` ### Caching Examples ```bash # First run - processes everything python process_meeting.py samples/meeting.mkv --run-whisper --use-vision # Second run - uses cached transcript and frames, only re-merges python process_meeting.py samples/meeting.mkv # Switch from OCR to vision using existing frames python process_meeting.py samples/meeting.mkv --use-vision # Force complete reprocessing python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --no-cache ``` ### Custom output location ```bash python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --output-dir my_outputs/ ``` ### Enable verbose logging ```bash # Show detailed debug information python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --verbose ``` ## Output Files All output files are saved to the `output/` directory by default: - **`output/