Files
mitus/README.md
2025-10-19 22:49:36 -03:00

273 lines
7.7 KiB
Markdown

# Meeting Processor
Extract screen content from meeting recordings and merge with Whisper transcripts for better Claude summarization.
## Overview
This tool enhances meeting transcripts by combining:
- **Audio transcription** (from Whisper)
- **Screen content** (OCR from screen shares)
The result is a rich, timestamped transcript that provides full context for AI summarization.
## Installation
### 1. System Dependencies
**Tesseract OCR** (recommended):
```bash
# Ubuntu/Debian
sudo apt-get install tesseract-ocr
# macOS
brew install tesseract
# Arch Linux
sudo pacman -S tesseract
```
**FFmpeg** (for scene detection):
```bash
# Ubuntu/Debian
sudo apt-get install ffmpeg
# macOS
brew install ffmpeg
```
### 2. Python Dependencies
```bash
pip install -r requirements.txt
```
### 3. Whisper (for audio transcription)
```bash
pip install openai-whisper
```
### 4. Optional: Install Alternative OCR Engines
```bash
# EasyOCR (better for rotated/handwritten text)
pip install easyocr
# PaddleOCR (better for code/terminal screens)
pip install paddleocr
```
## Quick Start
### Recommended: Run Everything in One Command
```bash
python process_meeting.py samples/meeting.mkv --run-whisper
```
This will:
1. Run Whisper transcription (audio → text)
2. Extract frames every 5 seconds
3. Run OCR to extract screen text
4. Merge audio + screen content
5. Save everything to `output/` folder
### Alternative: Use Existing Whisper Transcript
If you already have a Whisper transcript:
```bash
python process_meeting.py samples/meeting.mkv --transcript output/meeting.json
```
### Screen Content Only (No Audio)
```bash
python process_meeting.py samples/meeting.mkv
```
## Usage Examples
### Run with different Whisper models
```bash
# Tiny model (fastest, less accurate)
python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model tiny
# Small model (balanced)
python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model small
# Large model (slowest, most accurate)
python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model large
```
### Extract frames at different intervals
```bash
# Every 10 seconds (with Whisper)
python process_meeting.py samples/meeting.mkv --run-whisper --interval 10
# Every 3 seconds (more detailed)
python process_meeting.py samples/meeting.mkv --run-whisper --interval 3
```
### Use scene detection (smarter, fewer frames)
```bash
python process_meeting.py samples/meeting.mkv --run-whisper --scene-detection
```
### Use different OCR engines
```bash
# EasyOCR (good for varied layouts)
python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine easyocr
# PaddleOCR (good for code/terminal)
python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine paddleocr
```
### Extract frames only (no merging)
```bash
python process_meeting.py samples/meeting.mkv --extract-only
```
### Custom output location
```bash
python process_meeting.py samples/meeting.mkv --run-whisper --output-dir my_outputs/
```
### Enable verbose logging
```bash
# Show detailed debug information
python process_meeting.py samples/meeting.mkv --run-whisper --verbose
```
## Output Files
All output files are saved to the `output/` directory by default:
- **`output/<video>_enhanced.txt`** - Enhanced transcript ready for Claude
- **`output/<video>.json`** - Whisper transcript (if `--run-whisper` was used)
- **`output/<video>_ocr.json`** - Raw OCR data with timestamps
- **`frames/`** - Extracted video frames (JPG files)
## Workflow for Meeting Analysis
### Complete Workflow (One Command!)
```bash
# Process everything in one step
python process_meeting.py samples/alo-intro1.mkv --run-whisper --scene-detection
# Output will be in output/alo-intro1_enhanced.txt
```
### Traditional Workflow (Separate Steps)
```bash
# 1. Extract audio and transcribe with Whisper (optional, if not using --run-whisper)
whisper samples/alo-intro1.mkv --model base --output_format json --output_dir output
# 2. Process video to extract screen content
python process_meeting.py samples/alo-intro1.mkv \
--transcript output/alo-intro1.json \
--scene-detection
# 3. Use the enhanced transcript with Claude
# Copy the content from output/alo-intro1_enhanced.txt and paste into Claude
```
### Example Prompt for Claude
```
Please summarize this meeting transcript. Pay special attention to:
1. Key decisions made
2. Action items
3. Technical details shown on screen
4. Any metrics or data presented
[Paste enhanced transcript here]
```
## Command Reference
```
usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--run-whisper]
[--whisper-model {tiny,base,small,medium,large}]
[--output OUTPUT] [--output-dir OUTPUT_DIR]
[--frames-dir FRAMES_DIR] [--interval INTERVAL]
[--scene-detection]
[--ocr-engine {tesseract,easyocr,paddleocr}]
[--no-deduplicate] [--extract-only]
[--format {detailed,compact}] [--verbose]
video
Options:
video Path to video file
--transcript, -t Path to Whisper transcript (JSON or TXT)
--run-whisper Run Whisper transcription before processing
--whisper-model Whisper model: tiny, base, small, medium, large (default: base)
--output, -o Output file for enhanced transcript
--output-dir Directory for output files (default: output/)
--frames-dir Directory to save extracted frames (default: frames/)
--interval Extract frame every N seconds (default: 5)
--scene-detection Use scene detection instead of interval extraction
--ocr-engine OCR engine: tesseract, easyocr, paddleocr (default: tesseract)
--no-deduplicate Disable text deduplication
--extract-only Only extract frames and OCR, skip transcript merging
--format Output format: detailed or compact (default: detailed)
--verbose, -v Enable verbose logging (DEBUG level)
```
## Tips for Best Results
### Scene Detection vs Interval
- **Scene detection**: Better for presentations with distinct slides. More efficient.
- **Interval extraction**: Better for continuous screen sharing (coding, browsing). More thorough.
### OCR Engine Selection
- **Tesseract**: Best for clean slides, documents, presentations. Fast and lightweight.
- **EasyOCR**: Better for handwriting, rotated text, or varied fonts.
- **PaddleOCR**: Excellent for code, terminal outputs, and mixed languages.
### Deduplication
- Enabled by default - removes similar consecutive frames
- Disable with `--no-deduplicate` if slides change subtly
## Troubleshooting
### "pytesseract not installed"
```bash
pip install pytesseract
sudo apt-get install tesseract-ocr # Don't forget system package!
```
### "No frames extracted"
- Check video file is valid: `ffmpeg -i video.mkv`
- Try lower interval: `--interval 3`
- Check disk space in frames directory
### Poor OCR quality
- Try different OCR engine
- Check if video resolution is sufficient
- Use `--no-deduplicate` to keep more frames
### Scene detection not working
- Fallback to interval extraction automatically
- Ensure FFmpeg is installed
- Try manual interval: `--interval 5`
## Project Structure
```
meetus/
├── meetus/ # Main package
│ ├── __init__.py
│ ├── frame_extractor.py # Video frame extraction
│ ├── ocr_processor.py # OCR processing
│ └── transcript_merger.py # Transcript merging
├── process_meeting.py # Main CLI script
├── requirements.txt # Python dependencies
└── README.md # This file
```
## License
For personal use.