mariano/mitus

Fork 0

Files

Mariano Gabriel 93e0c06d38 init commit

2025-10-19 22:17:38 -03:00

6.3 KiB

Raw Blame History

Meeting Processor

Extract screen content from meeting recordings and merge with Whisper transcripts for better Claude summarization.

Overview

This tool enhances meeting transcripts by combining:

Audio transcription (from Whisper)
Screen content (OCR from screen shares)

The result is a rich, timestamped transcript that provides full context for AI summarization.

Installation

1. System Dependencies

Tesseract OCR (recommended):

# Ubuntu/Debian
sudo apt-get install tesseract-ocr

# macOS
brew install tesseract

# Arch Linux
sudo pacman -S tesseract

FFmpeg (for scene detection):

# Ubuntu/Debian
sudo apt-get install ffmpeg

# macOS
brew install ffmpeg

2. Python Dependencies

pip install -r requirements.txt

3. Optional: Install Alternative OCR Engines

# EasyOCR (better for rotated/handwritten text)
pip install easyocr

# PaddleOCR (better for code/terminal screens)
pip install paddleocr

Quick Start

Basic Usage (Screen Content Only)

python process_meeting.py samples/meeting.mkv

This will:

Extract frames every 5 seconds
Run OCR to extract screen text
Save enhanced transcript to meeting_enhanced.txt

With Whisper Transcript

First, generate a Whisper transcript:

whisper samples/meeting.mkv --model base --output_format json

Then process with screen content:

python process_meeting.py samples/meeting.mkv --transcript samples/meeting.json

Usage Examples

Extract frames at different intervals

# Every 10 seconds
python process_meeting.py samples/meeting.mkv --interval 10

# Every 3 seconds (more detailed)
python process_meeting.py samples/meeting.mkv --interval 3

Use scene detection (smarter, fewer frames)

python process_meeting.py samples/meeting.mkv --scene-detection

Use different OCR engines

# EasyOCR (good for varied layouts)
python process_meeting.py samples/meeting.mkv --ocr-engine easyocr

# PaddleOCR (good for code/terminal)
python process_meeting.py samples/meeting.mkv --ocr-engine paddleocr

Extract frames only (no merging)

python process_meeting.py samples/meeting.mkv --extract-only

Custom output location

python process_meeting.py samples/meeting.mkv --output my_meeting.txt --frames-dir my_frames/

Enable verbose logging

# Show detailed debug information
python process_meeting.py samples/meeting.mkv --verbose

# Short form
python process_meeting.py samples/meeting.mkv -v

Output Files

After processing, you'll get:

<video>_enhanced.txt - Enhanced transcript ready for Claude
<video>_ocr.json - Raw OCR data with timestamps
frames/ - Extracted video frames (JPG files)

Workflow for Meeting Analysis

Complete Workflow

# 1. Extract audio and transcribe with Whisper
whisper samples/alo-intro1.mkv --model base --output_format json

# 2. Process video to extract screen content
python process_meeting.py samples/alo-intro1.mkv \
    --transcript samples/alo-intro1.json \
    --scene-detection

# 3. Use the enhanced transcript with Claude
# Copy the content from alo-intro1_enhanced.txt and paste into Claude

Example Prompt for Claude

Please summarize this meeting transcript. Pay special attention to:
1. Key decisions made
2. Action items
3. Technical details shown on screen
4. Any metrics or data presented

[Paste enhanced transcript here]

Command Reference

usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--output OUTPUT]
                          [--frames-dir FRAMES_DIR] [--interval INTERVAL]
                          [--scene-detection]
                          [--ocr-engine {tesseract,easyocr,paddleocr}]
                          [--no-deduplicate] [--extract-only]
                          [--format {detailed,compact}] [--verbose]
                          video

Options:
  video                 Path to video file
  --transcript, -t      Path to Whisper transcript (JSON or TXT)
  --output, -o          Output file for enhanced transcript
  --frames-dir          Directory to save extracted frames (default: frames/)
  --interval            Extract frame every N seconds (default: 5)
  --scene-detection     Use scene detection instead of interval extraction
  --ocr-engine          OCR engine: tesseract, easyocr, paddleocr (default: tesseract)
  --no-deduplicate      Disable text deduplication
  --extract-only        Only extract frames and OCR, skip transcript merging
  --format              Output format: detailed or compact (default: detailed)
  --verbose, -v         Enable verbose logging (DEBUG level)

Tips for Best Results

Scene Detection vs Interval

Scene detection: Better for presentations with distinct slides. More efficient.
Interval extraction: Better for continuous screen sharing (coding, browsing). More thorough.

OCR Engine Selection

Tesseract: Best for clean slides, documents, presentations. Fast and lightweight.
EasyOCR: Better for handwriting, rotated text, or varied fonts.
PaddleOCR: Excellent for code, terminal outputs, and mixed languages.

Deduplication

Enabled by default - removes similar consecutive frames
Disable with --no-deduplicate if slides change subtly

Troubleshooting

"pytesseract not installed"

pip install pytesseract
sudo apt-get install tesseract-ocr  # Don't forget system package!

"No frames extracted"

Check video file is valid: ffmpeg -i video.mkv
Try lower interval: --interval 3
Check disk space in frames directory

Poor OCR quality

Try different OCR engine
Check if video resolution is sufficient
Use --no-deduplicate to keep more frames

Scene detection not working

Fallback to interval extraction automatically
Ensure FFmpeg is installed
Try manual interval: --interval 5

Project Structure

meetus/
├── meetus/                  # Main package
│   ├── __init__.py
│   ├── frame_extractor.py   # Video frame extraction
│   ├── ocr_processor.py     # OCR processing
│   └── transcript_merger.py # Transcript merging
├── process_meeting.py       # Main CLI script
├── requirements.txt         # Python dependencies
└── README.md               # This file

License

For personal use.

6.3 KiB Raw Blame History