6.3 KiB
6.3 KiB
Meeting Processor
Extract screen content from meeting recordings and merge with Whisper transcripts for better Claude summarization.
Overview
This tool enhances meeting transcripts by combining:
- Audio transcription (from Whisper)
- Screen content (OCR from screen shares)
The result is a rich, timestamped transcript that provides full context for AI summarization.
Installation
1. System Dependencies
Tesseract OCR (recommended):
# Ubuntu/Debian
sudo apt-get install tesseract-ocr
# macOS
brew install tesseract
# Arch Linux
sudo pacman -S tesseract
FFmpeg (for scene detection):
# Ubuntu/Debian
sudo apt-get install ffmpeg
# macOS
brew install ffmpeg
2. Python Dependencies
pip install -r requirements.txt
3. Optional: Install Alternative OCR Engines
# EasyOCR (better for rotated/handwritten text)
pip install easyocr
# PaddleOCR (better for code/terminal screens)
pip install paddleocr
Quick Start
Basic Usage (Screen Content Only)
python process_meeting.py samples/meeting.mkv
This will:
- Extract frames every 5 seconds
- Run OCR to extract screen text
- Save enhanced transcript to
meeting_enhanced.txt
With Whisper Transcript
First, generate a Whisper transcript:
whisper samples/meeting.mkv --model base --output_format json
Then process with screen content:
python process_meeting.py samples/meeting.mkv --transcript samples/meeting.json
Usage Examples
Extract frames at different intervals
# Every 10 seconds
python process_meeting.py samples/meeting.mkv --interval 10
# Every 3 seconds (more detailed)
python process_meeting.py samples/meeting.mkv --interval 3
Use scene detection (smarter, fewer frames)
python process_meeting.py samples/meeting.mkv --scene-detection
Use different OCR engines
# EasyOCR (good for varied layouts)
python process_meeting.py samples/meeting.mkv --ocr-engine easyocr
# PaddleOCR (good for code/terminal)
python process_meeting.py samples/meeting.mkv --ocr-engine paddleocr
Extract frames only (no merging)
python process_meeting.py samples/meeting.mkv --extract-only
Custom output location
python process_meeting.py samples/meeting.mkv --output my_meeting.txt --frames-dir my_frames/
Enable verbose logging
# Show detailed debug information
python process_meeting.py samples/meeting.mkv --verbose
# Short form
python process_meeting.py samples/meeting.mkv -v
Output Files
After processing, you'll get:
<video>_enhanced.txt- Enhanced transcript ready for Claude<video>_ocr.json- Raw OCR data with timestampsframes/- Extracted video frames (JPG files)
Workflow for Meeting Analysis
Complete Workflow
# 1. Extract audio and transcribe with Whisper
whisper samples/alo-intro1.mkv --model base --output_format json
# 2. Process video to extract screen content
python process_meeting.py samples/alo-intro1.mkv \
--transcript samples/alo-intro1.json \
--scene-detection
# 3. Use the enhanced transcript with Claude
# Copy the content from alo-intro1_enhanced.txt and paste into Claude
Example Prompt for Claude
Please summarize this meeting transcript. Pay special attention to:
1. Key decisions made
2. Action items
3. Technical details shown on screen
4. Any metrics or data presented
[Paste enhanced transcript here]
Command Reference
usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--output OUTPUT]
[--frames-dir FRAMES_DIR] [--interval INTERVAL]
[--scene-detection]
[--ocr-engine {tesseract,easyocr,paddleocr}]
[--no-deduplicate] [--extract-only]
[--format {detailed,compact}] [--verbose]
video
Options:
video Path to video file
--transcript, -t Path to Whisper transcript (JSON or TXT)
--output, -o Output file for enhanced transcript
--frames-dir Directory to save extracted frames (default: frames/)
--interval Extract frame every N seconds (default: 5)
--scene-detection Use scene detection instead of interval extraction
--ocr-engine OCR engine: tesseract, easyocr, paddleocr (default: tesseract)
--no-deduplicate Disable text deduplication
--extract-only Only extract frames and OCR, skip transcript merging
--format Output format: detailed or compact (default: detailed)
--verbose, -v Enable verbose logging (DEBUG level)
Tips for Best Results
Scene Detection vs Interval
- Scene detection: Better for presentations with distinct slides. More efficient.
- Interval extraction: Better for continuous screen sharing (coding, browsing). More thorough.
OCR Engine Selection
- Tesseract: Best for clean slides, documents, presentations. Fast and lightweight.
- EasyOCR: Better for handwriting, rotated text, or varied fonts.
- PaddleOCR: Excellent for code, terminal outputs, and mixed languages.
Deduplication
- Enabled by default - removes similar consecutive frames
- Disable with
--no-deduplicateif slides change subtly
Troubleshooting
"pytesseract not installed"
pip install pytesseract
sudo apt-get install tesseract-ocr # Don't forget system package!
"No frames extracted"
- Check video file is valid:
ffmpeg -i video.mkv - Try lower interval:
--interval 3 - Check disk space in frames directory
Poor OCR quality
- Try different OCR engine
- Check if video resolution is sufficient
- Use
--no-deduplicateto keep more frames
Scene detection not working
- Fallback to interval extraction automatically
- Ensure FFmpeg is installed
- Try manual interval:
--interval 5
Project Structure
meetus/
├── meetus/ # Main package
│ ├── __init__.py
│ ├── frame_extractor.py # Video frame extraction
│ ├── ocr_processor.py # OCR processing
│ └── transcript_merger.py # Transcript merging
├── process_meeting.py # Main CLI script
├── requirements.txt # Python dependencies
└── README.md # This file
License
For personal use.