diff --git a/README.md b/README.md index 4a1d7ab..471e07b 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,17 @@ # Meeting Processor -Extract screen content from meeting recordings and merge with Whisper transcripts for better Claude summarization. +Extract screen content from meeting recordings and merge with Whisper transcripts for better AI summarization. ## Overview This tool enhances meeting transcripts by combining: - **Audio transcription** (from Whisper) -- **Screen content** (OCR from screen shares) +- **Screen content analysis** (Vision models or OCR) + +### Vision Analysis vs OCR + +- **Vision Models** (recommended): Uses local LLaVA model via Ollama to understand context - great for dashboards, code, consoles +- **OCR**: Traditional text extraction - faster but less context-aware The result is a rich, timestamped transcript that provides full context for AI summarization. @@ -14,16 +19,13 @@ The result is a rich, timestamped transcript that provides full context for AI s ### 1. System Dependencies -**Tesseract OCR** (recommended): +**Ollama** (required for vision analysis): ```bash -# Ubuntu/Debian -sudo apt-get install tesseract-ocr - -# macOS -brew install tesseract - -# Arch Linux -sudo pacman -S tesseract +# Install from https://ollama.ai/download +# Then pull a vision model: +ollama pull llava:13b +# or for lighter model: +ollama pull llava:7b ``` **FFmpeg** (for scene detection): @@ -35,6 +37,18 @@ sudo apt-get install ffmpeg brew install ffmpeg ``` +**Tesseract OCR** (optional, if not using vision): +```bash +# Ubuntu/Debian +sudo apt-get install tesseract-ocr + +# macOS +brew install tesseract + +# Arch Linux +sudo pacman -S tesseract +``` + ### 2. Python Dependencies ```bash @@ -49,6 +63,7 @@ pip install openai-whisper ### 4. Optional: Install Alternative OCR Engines +If you prefer OCR over vision analysis: ```bash # EasyOCR (better for rotated/handwritten text) pip install easyocr @@ -59,118 +74,173 @@ pip install paddleocr ## Quick Start -### Recommended: Run Everything in One Command +### Recommended: Vision Analysis (Best for Code/Dashboards) ```bash -python process_meeting.py samples/meeting.mkv --run-whisper +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision ``` This will: 1. Run Whisper transcription (audio → text) 2. Extract frames every 5 seconds -3. Run OCR to extract screen text +3. Use LLaVA vision model to analyze frames with context 4. Merge audio + screen content 5. Save everything to `output/` folder -### Alternative: Use Existing Whisper Transcript +### Re-run with Cached Results -If you already have a Whisper transcript: +Already ran it once? Re-run instantly using cached results: ```bash -python process_meeting.py samples/meeting.mkv --transcript output/meeting.json +# Uses cached transcript, frames, and analysis +python process_meeting.py samples/meeting.mkv --use-vision + +# Force reprocessing +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --no-cache ``` -### Screen Content Only (No Audio) +### Traditional OCR (Faster, Less Context-Aware) ```bash -python process_meeting.py samples/meeting.mkv +python process_meeting.py samples/meeting.mkv --run-whisper ``` ## Usage Examples -### Run with different Whisper models +### Vision Analysis with Context Hints ```bash -# Tiny model (fastest, less accurate) -python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model tiny +# For code-heavy meetings +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-context code -# Small model (balanced) -python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model small +# For dashboard/monitoring meetings (Grafana, GCP, etc.) +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-context dashboard -# Large model (slowest, most accurate) -python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model large +# For console/terminal sessions +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-context console +``` + +### Different Vision Models +```bash +# Lighter/faster model (7B parameters) +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-model llava:7b + +# Default model (13B parameters, better quality) +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-model llava:13b + +# Alternative models +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-model bakllava ``` ### Extract frames at different intervals ```bash -# Every 10 seconds (with Whisper) -python process_meeting.py samples/meeting.mkv --run-whisper --interval 10 +# Every 10 seconds +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --interval 10 # Every 3 seconds (more detailed) -python process_meeting.py samples/meeting.mkv --run-whisper --interval 3 +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --interval 3 ``` ### Use scene detection (smarter, fewer frames) ```bash -python process_meeting.py samples/meeting.mkv --run-whisper --scene-detection +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --scene-detection ``` -### Use different OCR engines +### Traditional OCR (if you prefer) ```bash -# EasyOCR (good for varied layouts) +# Tesseract (default) +python process_meeting.py samples/meeting.mkv --run-whisper + +# EasyOCR python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine easyocr -# PaddleOCR (good for code/terminal) +# PaddleOCR python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine paddleocr ``` -### Extract frames only (no merging) +### Caching Examples ```bash -python process_meeting.py samples/meeting.mkv --extract-only +# First run - processes everything +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision + +# Second run - uses cached transcript and frames, only re-merges +python process_meeting.py samples/meeting.mkv + +# Switch from OCR to vision using existing frames +python process_meeting.py samples/meeting.mkv --use-vision + +# Force complete reprocessing +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --no-cache ``` ### Custom output location ```bash -python process_meeting.py samples/meeting.mkv --run-whisper --output-dir my_outputs/ +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --output-dir my_outputs/ ``` ### Enable verbose logging ```bash # Show detailed debug information -python process_meeting.py samples/meeting.mkv --run-whisper --verbose +python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --verbose ``` ## Output Files All output files are saved to the `output/` directory by default: -- **`output/