mitus/README.md

# Meeting Processor

Extract screen content from meeting recordings and merge with Whisper transcripts for better Claude summarization.

## Overview

This tool enhances meeting transcripts by combining:
- **Audio transcription** (from Whisper)
- **Screen content** (OCR from screen shares)

The result is a rich, timestamped transcript that provides full context for AI summarization.

## Installation

### 1. System Dependencies

**Tesseract OCR** (recommended):
```bash
# Ubuntu/Debian
sudo apt-get install tesseract-ocr

# macOS
brew install tesseract

# Arch Linux
sudo pacman -S tesseract
```

**FFmpeg** (for scene detection):
```bash
# Ubuntu/Debian
sudo apt-get install ffmpeg

# macOS
brew install ffmpeg
```

### 2. Python Dependencies

```bash
pip install -r requirements.txt
```

### 3. Optional: Install Alternative OCR Engines

```bash
# EasyOCR (better for rotated/handwritten text)
pip install easyocr

# PaddleOCR (better for code/terminal screens)
pip install paddleocr
```

## Quick Start

### Basic Usage (Screen Content Only)

```bash
python process_meeting.py samples/meeting.mkv
```

This will:
1. Extract frames every 5 seconds
2. Run OCR to extract screen text
3. Save enhanced transcript to `meeting_enhanced.txt`

### With Whisper Transcript

First, generate a Whisper transcript:
```bash
whisper samples/meeting.mkv --model base --output_format json
```

Then process with screen content:
```bash
python process_meeting.py samples/meeting.mkv --transcript samples/meeting.json
```

## Usage Examples

### Extract frames at different intervals
```bash
# Every 10 seconds
python process_meeting.py samples/meeting.mkv --interval 10

# Every 3 seconds (more detailed)
python process_meeting.py samples/meeting.mkv --interval 3
```

### Use scene detection (smarter, fewer frames)
```bash
python process_meeting.py samples/meeting.mkv --scene-detection
```

### Use different OCR engines
```bash
# EasyOCR (good for varied layouts)
python process_meeting.py samples/meeting.mkv --ocr-engine easyocr

# PaddleOCR (good for code/terminal)
python process_meeting.py samples/meeting.mkv --ocr-engine paddleocr
```

### Extract frames only (no merging)
```bash
python process_meeting.py samples/meeting.mkv --extract-only
```

### Custom output location
```bash
python process_meeting.py samples/meeting.mkv --output my_meeting.txt --frames-dir my_frames/
```

### Enable verbose logging
```bash
# Show detailed debug information
python process_meeting.py samples/meeting.mkv --verbose

# Short form
python process_meeting.py samples/meeting.mkv -v
```

## Output Files

After processing, you'll get:

- **`<video>_enhanced.txt`** - Enhanced transcript ready for Claude
- **`<video>_ocr.json`** - Raw OCR data with timestamps
- **`frames/`** - Extracted video frames (JPG files)

## Workflow for Meeting Analysis

### Complete Workflow

```bash
# 1. Extract audio and transcribe with Whisper
whisper samples/alo-intro1.mkv --model base --output_format json

# 2. Process video to extract screen content
python process_meeting.py samples/alo-intro1.mkv \
    --transcript samples/alo-intro1.json \
    --scene-detection

# 3. Use the enhanced transcript with Claude
# Copy the content from alo-intro1_enhanced.txt and paste into Claude
```

### Example Prompt for Claude

```
Please summarize this meeting transcript. Pay special attention to:
1. Key decisions made
2. Action items
3. Technical details shown on screen
4. Any metrics or data presented

[Paste enhanced transcript here]
```

## Command Reference

```
usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--output OUTPUT]
                          [--frames-dir FRAMES_DIR] [--interval INTERVAL]
                          [--scene-detection]
                          [--ocr-engine {tesseract,easyocr,paddleocr}]
                          [--no-deduplicate] [--extract-only]
                          [--format {detailed,compact}] [--verbose]
                          video

Options:
  video                 Path to video file
  --transcript, -t      Path to Whisper transcript (JSON or TXT)
  --output, -o          Output file for enhanced transcript
  --frames-dir          Directory to save extracted frames (default: frames/)
  --interval            Extract frame every N seconds (default: 5)
  --scene-detection     Use scene detection instead of interval extraction
  --ocr-engine          OCR engine: tesseract, easyocr, paddleocr (default: tesseract)
  --no-deduplicate      Disable text deduplication
  --extract-only        Only extract frames and OCR, skip transcript merging
  --format              Output format: detailed or compact (default: detailed)
  --verbose, -v         Enable verbose logging (DEBUG level)
```

## Tips for Best Results

### Scene Detection vs Interval
- **Scene detection**: Better for presentations with distinct slides. More efficient.
- **Interval extraction**: Better for continuous screen sharing (coding, browsing). More thorough.

### OCR Engine Selection
- **Tesseract**: Best for clean slides, documents, presentations. Fast and lightweight.
- **EasyOCR**: Better for handwriting, rotated text, or varied fonts.
- **PaddleOCR**: Excellent for code, terminal outputs, and mixed languages.

### Deduplication
- Enabled by default - removes similar consecutive frames
- Disable with `--no-deduplicate` if slides change subtly

## Troubleshooting

### "pytesseract not installed"
```bash
pip install pytesseract
sudo apt-get install tesseract-ocr  # Don't forget system package!
```

### "No frames extracted"
- Check video file is valid: `ffmpeg -i video.mkv`
- Try lower interval: `--interval 3`
- Check disk space in frames directory

### Poor OCR quality
- Try different OCR engine
- Check if video resolution is sufficient
- Use `--no-deduplicate` to keep more frames

### Scene detection not working
- Fallback to interval extraction automatically
- Ensure FFmpeg is installed
- Try manual interval: `--interval 5`

## Project Structure

```
meetus/
├── meetus/                  # Main package
│   ├── __init__.py
│   ├── frame_extractor.py   # Video frame extraction
│   ├── ocr_processor.py     # OCR processing
│   └── transcript_merger.py # Transcript merging
├── process_meeting.py       # Main CLI script
├── requirements.txt         # Python dependencies
└── README.md               # This file
```

## License

For personal use.