init commit
This commit is contained in:
239
README.md
Normal file
239
README.md
Normal file
@@ -0,0 +1,239 @@
|
||||
# Meeting Processor
|
||||
|
||||
Extract screen content from meeting recordings and merge with Whisper transcripts for better Claude summarization.
|
||||
|
||||
## Overview
|
||||
|
||||
This tool enhances meeting transcripts by combining:
|
||||
- **Audio transcription** (from Whisper)
|
||||
- **Screen content** (OCR from screen shares)
|
||||
|
||||
The result is a rich, timestamped transcript that provides full context for AI summarization.
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. System Dependencies
|
||||
|
||||
**Tesseract OCR** (recommended):
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt-get install tesseract-ocr
|
||||
|
||||
# macOS
|
||||
brew install tesseract
|
||||
|
||||
# Arch Linux
|
||||
sudo pacman -S tesseract
|
||||
```
|
||||
|
||||
**FFmpeg** (for scene detection):
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt-get install ffmpeg
|
||||
|
||||
# macOS
|
||||
brew install ffmpeg
|
||||
```
|
||||
|
||||
### 2. Python Dependencies
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 3. Optional: Install Alternative OCR Engines
|
||||
|
||||
```bash
|
||||
# EasyOCR (better for rotated/handwritten text)
|
||||
pip install easyocr
|
||||
|
||||
# PaddleOCR (better for code/terminal screens)
|
||||
pip install paddleocr
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Usage (Screen Content Only)
|
||||
|
||||
```bash
|
||||
python process_meeting.py samples/meeting.mkv
|
||||
```
|
||||
|
||||
This will:
|
||||
1. Extract frames every 5 seconds
|
||||
2. Run OCR to extract screen text
|
||||
3. Save enhanced transcript to `meeting_enhanced.txt`
|
||||
|
||||
### With Whisper Transcript
|
||||
|
||||
First, generate a Whisper transcript:
|
||||
```bash
|
||||
whisper samples/meeting.mkv --model base --output_format json
|
||||
```
|
||||
|
||||
Then process with screen content:
|
||||
```bash
|
||||
python process_meeting.py samples/meeting.mkv --transcript samples/meeting.json
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Extract frames at different intervals
|
||||
```bash
|
||||
# Every 10 seconds
|
||||
python process_meeting.py samples/meeting.mkv --interval 10
|
||||
|
||||
# Every 3 seconds (more detailed)
|
||||
python process_meeting.py samples/meeting.mkv --interval 3
|
||||
```
|
||||
|
||||
### Use scene detection (smarter, fewer frames)
|
||||
```bash
|
||||
python process_meeting.py samples/meeting.mkv --scene-detection
|
||||
```
|
||||
|
||||
### Use different OCR engines
|
||||
```bash
|
||||
# EasyOCR (good for varied layouts)
|
||||
python process_meeting.py samples/meeting.mkv --ocr-engine easyocr
|
||||
|
||||
# PaddleOCR (good for code/terminal)
|
||||
python process_meeting.py samples/meeting.mkv --ocr-engine paddleocr
|
||||
```
|
||||
|
||||
### Extract frames only (no merging)
|
||||
```bash
|
||||
python process_meeting.py samples/meeting.mkv --extract-only
|
||||
```
|
||||
|
||||
### Custom output location
|
||||
```bash
|
||||
python process_meeting.py samples/meeting.mkv --output my_meeting.txt --frames-dir my_frames/
|
||||
```
|
||||
|
||||
### Enable verbose logging
|
||||
```bash
|
||||
# Show detailed debug information
|
||||
python process_meeting.py samples/meeting.mkv --verbose
|
||||
|
||||
# Short form
|
||||
python process_meeting.py samples/meeting.mkv -v
|
||||
```
|
||||
|
||||
## Output Files
|
||||
|
||||
After processing, you'll get:
|
||||
|
||||
- **`<video>_enhanced.txt`** - Enhanced transcript ready for Claude
|
||||
- **`<video>_ocr.json`** - Raw OCR data with timestamps
|
||||
- **`frames/`** - Extracted video frames (JPG files)
|
||||
|
||||
## Workflow for Meeting Analysis
|
||||
|
||||
### Complete Workflow
|
||||
|
||||
```bash
|
||||
# 1. Extract audio and transcribe with Whisper
|
||||
whisper samples/alo-intro1.mkv --model base --output_format json
|
||||
|
||||
# 2. Process video to extract screen content
|
||||
python process_meeting.py samples/alo-intro1.mkv \
|
||||
--transcript samples/alo-intro1.json \
|
||||
--scene-detection
|
||||
|
||||
# 3. Use the enhanced transcript with Claude
|
||||
# Copy the content from alo-intro1_enhanced.txt and paste into Claude
|
||||
```
|
||||
|
||||
### Example Prompt for Claude
|
||||
|
||||
```
|
||||
Please summarize this meeting transcript. Pay special attention to:
|
||||
1. Key decisions made
|
||||
2. Action items
|
||||
3. Technical details shown on screen
|
||||
4. Any metrics or data presented
|
||||
|
||||
[Paste enhanced transcript here]
|
||||
```
|
||||
|
||||
## Command Reference
|
||||
|
||||
```
|
||||
usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--output OUTPUT]
|
||||
[--frames-dir FRAMES_DIR] [--interval INTERVAL]
|
||||
[--scene-detection]
|
||||
[--ocr-engine {tesseract,easyocr,paddleocr}]
|
||||
[--no-deduplicate] [--extract-only]
|
||||
[--format {detailed,compact}] [--verbose]
|
||||
video
|
||||
|
||||
Options:
|
||||
video Path to video file
|
||||
--transcript, -t Path to Whisper transcript (JSON or TXT)
|
||||
--output, -o Output file for enhanced transcript
|
||||
--frames-dir Directory to save extracted frames (default: frames/)
|
||||
--interval Extract frame every N seconds (default: 5)
|
||||
--scene-detection Use scene detection instead of interval extraction
|
||||
--ocr-engine OCR engine: tesseract, easyocr, paddleocr (default: tesseract)
|
||||
--no-deduplicate Disable text deduplication
|
||||
--extract-only Only extract frames and OCR, skip transcript merging
|
||||
--format Output format: detailed or compact (default: detailed)
|
||||
--verbose, -v Enable verbose logging (DEBUG level)
|
||||
```
|
||||
|
||||
## Tips for Best Results
|
||||
|
||||
### Scene Detection vs Interval
|
||||
- **Scene detection**: Better for presentations with distinct slides. More efficient.
|
||||
- **Interval extraction**: Better for continuous screen sharing (coding, browsing). More thorough.
|
||||
|
||||
### OCR Engine Selection
|
||||
- **Tesseract**: Best for clean slides, documents, presentations. Fast and lightweight.
|
||||
- **EasyOCR**: Better for handwriting, rotated text, or varied fonts.
|
||||
- **PaddleOCR**: Excellent for code, terminal outputs, and mixed languages.
|
||||
|
||||
### Deduplication
|
||||
- Enabled by default - removes similar consecutive frames
|
||||
- Disable with `--no-deduplicate` if slides change subtly
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "pytesseract not installed"
|
||||
```bash
|
||||
pip install pytesseract
|
||||
sudo apt-get install tesseract-ocr # Don't forget system package!
|
||||
```
|
||||
|
||||
### "No frames extracted"
|
||||
- Check video file is valid: `ffmpeg -i video.mkv`
|
||||
- Try lower interval: `--interval 3`
|
||||
- Check disk space in frames directory
|
||||
|
||||
### Poor OCR quality
|
||||
- Try different OCR engine
|
||||
- Check if video resolution is sufficient
|
||||
- Use `--no-deduplicate` to keep more frames
|
||||
|
||||
### Scene detection not working
|
||||
- Fallback to interval extraction automatically
|
||||
- Ensure FFmpeg is installed
|
||||
- Try manual interval: `--interval 5`
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
meetus/
|
||||
├── meetus/ # Main package
|
||||
│ ├── __init__.py
|
||||
│ ├── frame_extractor.py # Video frame extraction
|
||||
│ ├── ocr_processor.py # OCR processing
|
||||
│ └── transcript_merger.py # Transcript merging
|
||||
├── process_meeting.py # Main CLI script
|
||||
├── requirements.txt # Python dependencies
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
For personal use.
|
||||
Reference in New Issue
Block a user