349 lines
12 KiB
Markdown
349 lines
12 KiB
Markdown
# Meeting Processor
|
|
|
|
Extract screen content from meeting recordings and merge with Whisper/WhisperX transcripts for better AI summarization.
|
|
|
|
## Overview
|
|
|
|
This tool enhances meeting transcripts by combining:
|
|
- **Audio transcription** (Whisper or WhisperX with speaker diarization)
|
|
- **Screen content extraction** via FFmpeg scene detection
|
|
- **Frame embedding** for direct LLM analysis
|
|
|
|
The result is a rich, timestamped transcript with embedded screen frames that provides full context for AI summarization.
|
|
|
|
## Installation
|
|
|
|
### 1. System Dependencies
|
|
|
|
**FFmpeg** (required for scene detection and frame extraction):
|
|
```bash
|
|
# Ubuntu/Debian
|
|
sudo apt-get install ffmpeg
|
|
|
|
# macOS
|
|
brew install ffmpeg
|
|
```
|
|
|
|
### 2. Python Dependencies
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### 3. Whisper or WhisperX (for audio transcription)
|
|
|
|
**Standard Whisper:**
|
|
```bash
|
|
pip install openai-whisper
|
|
```
|
|
|
|
**WhisperX** (recommended - includes speaker diarization):
|
|
```bash
|
|
pip install whisperx
|
|
```
|
|
|
|
For speaker diarization, you'll need a HuggingFace token with access to pyannote models.
|
|
|
|
## Quick Start
|
|
|
|
### Recommended: Embed Frames with Scene Detection
|
|
|
|
```bash
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
|
|
```
|
|
|
|
This will:
|
|
1. Run Whisper transcription (audio → text)
|
|
2. Extract frames at scene changes (smarter than fixed intervals)
|
|
3. Embed frame references in the transcript for LLM analysis
|
|
4. Save everything to `output/` folder
|
|
|
|
### With Speaker Diarization (WhisperX)
|
|
|
|
```bash
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --diarize --embed-images --scene-detection
|
|
```
|
|
|
|
This uses WhisperX to identify different speakers in the transcript.
|
|
|
|
### Re-run with Cached Results
|
|
|
|
Already ran it once? Re-run instantly using cached results:
|
|
```bash
|
|
# Uses cached transcript and frames
|
|
python process_meeting.py samples/meeting.mkv --embed-images
|
|
|
|
# Skip only specific cached items
|
|
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-frames
|
|
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
|
|
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-analysis
|
|
|
|
# Force complete reprocessing
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --no-cache
|
|
```
|
|
|
|
## Usage Examples
|
|
|
|
### Scene Detection Options
|
|
```bash
|
|
# Default scene detection (threshold: 15)
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
|
|
|
|
# More sensitive (more frames captured, threshold: 5)
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --scene-threshold 5
|
|
|
|
# Less sensitive (fewer frames, threshold: 30)
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --scene-threshold 30
|
|
```
|
|
|
|
### Fixed Interval Extraction (alternative to scene detection)
|
|
```bash
|
|
# Every 10 seconds
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --interval 10
|
|
|
|
# Every 3 seconds (more detailed)
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --interval 3
|
|
```
|
|
|
|
### Frame Quality Options
|
|
```bash
|
|
# Default quality (80)
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
|
|
|
|
# Lower quality for smaller files (60)
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --embed-quality 60
|
|
```
|
|
|
|
### Caching Examples
|
|
```bash
|
|
# First run - processes everything
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
|
|
|
|
# Iterate on scene threshold (reuse whisper transcript)
|
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis
|
|
|
|
# Re-run whisper only
|
|
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
|
|
|
|
# Force complete reprocessing
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --no-cache
|
|
```
|
|
|
|
### Custom output location
|
|
```bash
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --output-dir my_outputs/
|
|
```
|
|
|
|
### Enable verbose logging
|
|
```bash
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --verbose
|
|
```
|
|
|
|
## Output Files
|
|
|
|
Each video gets its own timestamped output directory:
|
|
|
|
```
|
|
output/
|
|
└── 20241019_143022-meeting/
|
|
├── manifest.json # Processing configuration
|
|
├── meeting_enhanced.txt # Enhanced transcript for AI
|
|
├── meeting.json # Whisper/WhisperX transcript
|
|
└── frames/ # Extracted video frames
|
|
├── frame_00001_5.00s.jpg
|
|
├── frame_00002_10.00s.jpg
|
|
└── ...
|
|
```
|
|
|
|
### Caching Behavior
|
|
|
|
The tool automatically reuses the most recent output directory for the same video:
|
|
- **First run**: Creates new timestamped directory (e.g., `20241019_143022-meeting/`)
|
|
- **Subsequent runs**: Reuses the same directory and cached results
|
|
- **Cached items**: Whisper transcript, extracted frames, analysis results
|
|
|
|
**Fine-grained cache control:**
|
|
- `--no-cache`: Force complete reprocessing
|
|
- `--skip-cache-frames`: Re-extract frames only
|
|
- `--skip-cache-whisper`: Re-run transcription only
|
|
- `--skip-cache-analysis`: Re-run analysis only
|
|
|
|
This allows you to iterate on scene detection thresholds without re-running Whisper!
|
|
|
|
## Workflow for Meeting Analysis
|
|
|
|
### Complete Workflow (One Command!)
|
|
|
|
```bash
|
|
# Process everything in one step with scene detection
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
|
|
|
|
# With speaker diarization
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --diarize --embed-images --scene-detection
|
|
```
|
|
|
|
### Typical Iterative Workflow
|
|
|
|
```bash
|
|
# First run - full processing
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
|
|
|
|
# Adjust scene threshold (keeps cached whisper transcript)
|
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --skip-cache-frames --skip-cache-analysis
|
|
|
|
# Try different frame quality
|
|
python process_meeting.py samples/meeting.mkv --embed-images --embed-quality 60 --skip-cache-frames --skip-cache-analysis
|
|
```
|
|
|
|
### Example Prompt for Claude
|
|
|
|
```
|
|
Please summarize this meeting transcript. Pay special attention to:
|
|
1. Key decisions made
|
|
2. Action items
|
|
3. Technical details shown on screen
|
|
4. Any metrics or data presented
|
|
|
|
[Paste enhanced transcript here]
|
|
```
|
|
|
|
## Command Reference
|
|
|
|
```
|
|
usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--run-whisper]
|
|
[--whisper-model {tiny,base,small,medium,large}]
|
|
[--diarize] [--output OUTPUT] [--output-dir OUTPUT_DIR]
|
|
[--interval INTERVAL] [--scene-detection]
|
|
[--scene-threshold SCENE_THRESHOLD]
|
|
[--embed-images] [--embed-quality EMBED_QUALITY]
|
|
[--no-cache] [--skip-cache-frames] [--skip-cache-whisper]
|
|
[--skip-cache-analysis] [--no-deduplicate]
|
|
[--extract-only] [--format {detailed,compact}]
|
|
[--verbose] video
|
|
|
|
Main Options:
|
|
video Path to video file
|
|
--run-whisper Run Whisper transcription before processing
|
|
--whisper-model Whisper model: tiny, base, small, medium, large (default: medium)
|
|
--diarize Use WhisperX with speaker diarization
|
|
--embed-images Embed frame references for LLM analysis (recommended)
|
|
--embed-quality JPEG quality for frames (default: 80)
|
|
|
|
Frame Extraction:
|
|
--scene-detection Use FFmpeg scene detection (recommended)
|
|
--scene-threshold Detection sensitivity 0-100 (default: 15, lower=more sensitive)
|
|
--interval Extract frame every N seconds (alternative to scene detection)
|
|
|
|
Caching:
|
|
--no-cache Force complete reprocessing
|
|
--skip-cache-frames Re-extract frames only
|
|
--skip-cache-whisper Re-run transcription only
|
|
--skip-cache-analysis Re-run analysis only
|
|
|
|
Other:
|
|
--transcript, -t Path to existing Whisper transcript (JSON or TXT)
|
|
--output, -o Output file for enhanced transcript
|
|
--output-dir Directory for output files (default: output/)
|
|
--verbose, -v Enable verbose logging
|
|
```
|
|
|
|
## Tips for Best Results
|
|
|
|
### Scene Detection vs Interval
|
|
- **Scene detection** (`--scene-detection`): Recommended. Captures frames when content changes. More efficient.
|
|
- **Interval extraction** (`--interval N`): Alternative for continuous content. Captures every N seconds.
|
|
|
|
### Scene Detection Threshold
|
|
- Lower values (5-10): More sensitive, captures more frames
|
|
- Default (15): Good balance for most meetings
|
|
- Higher values (20-30): Less sensitive, fewer frames
|
|
|
|
### Whisper vs WhisperX
|
|
- **Whisper** (`--run-whisper`): Standard transcription, fast
|
|
- **WhisperX** (`--run-whisper --diarize`): Adds speaker identification, requires HuggingFace token
|
|
|
|
### Frame Quality
|
|
- Default quality (80) works well for most cases
|
|
- Use `--embed-quality 60` for smaller files if storage is a concern
|
|
|
|
### Deduplication
|
|
- Enabled by default - removes similar consecutive frames
|
|
- Disable with `--no-deduplicate` if slides/screens change subtly
|
|
|
|
## Troubleshooting
|
|
|
|
### Frame Extraction Issues
|
|
|
|
**"No frames extracted"**
|
|
- Check video file is valid: `ffmpeg -i video.mkv`
|
|
- Try lower scene threshold: `--scene-threshold 5`
|
|
- Try interval extraction: `--interval 3`
|
|
- Check disk space in output directory
|
|
|
|
**Scene detection not working**
|
|
- Ensure FFmpeg is installed
|
|
- Falls back to interval extraction automatically
|
|
- Try manual interval: `--interval 5`
|
|
|
|
### Whisper/WhisperX Issues
|
|
|
|
**WhisperX diarization not working**
|
|
- Ensure you have a HuggingFace token set
|
|
- Token needs access to pyannote models
|
|
- Fall back to standard Whisper without `--diarize`
|
|
|
|
### Cache Issues
|
|
|
|
**Cache not being used**
|
|
- Ensure you're using the same video filename
|
|
- Check that output directory contains cached files
|
|
- Use `--verbose` to see what's being cached/loaded
|
|
|
|
**Want to re-run specific steps**
|
|
- `--skip-cache-frames`: Re-extract frames
|
|
- `--skip-cache-whisper`: Re-run transcription
|
|
- `--skip-cache-analysis`: Re-run analysis
|
|
- `--no-cache`: Force complete reprocessing
|
|
|
|
## Experimental Features
|
|
|
|
### OCR and Vision Analysis
|
|
|
|
OCR (`--ocr-engine`) and Vision analysis (`--use-vision`) options are available but experimental. The recommended approach is to use `--embed-images` which embeds frame references directly in the transcript, letting your LLM analyze the images.
|
|
|
|
```bash
|
|
# Experimental: OCR extraction
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine tesseract
|
|
|
|
# Experimental: Vision model analysis
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --vision-model llava:13b
|
|
|
|
# Experimental: Hybrid OpenCV + OCR
|
|
python process_meeting.py samples/meeting.mkv --run-whisper --use-hybrid
|
|
```
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
meetus/
|
|
├── meetus/ # Main package
|
|
│ ├── __init__.py
|
|
│ ├── workflow.py # Processing orchestrator
|
|
│ ├── output_manager.py # Output directory & manifest management
|
|
│ ├── cache_manager.py # Caching logic
|
|
│ ├── frame_extractor.py # Video frame extraction (FFmpeg scene detection)
|
|
│ ├── vision_processor.py # Vision model analysis (experimental)
|
|
│ ├── ocr_processor.py # OCR processing (experimental)
|
|
│ └── transcript_merger.py # Transcript merging
|
|
├── process_meeting.py # Main CLI script
|
|
├── requirements.txt # Python dependencies
|
|
├── output/ # Timestamped output directories
|
|
│ └── YYYYMMDD_HHMMSS-video/ # Auto-generated per video
|
|
├── samples/ # Sample videos (gitignored)
|
|
└── README.md # This file
|
|
```
|
|
|
|
## License
|
|
|
|
For personal use.
|