mariano/mitus

Fork 0

Files

Mariano Gabriel ae89564373 add whisper to main command, ignore output files

2025-10-19 22:49:36 -03:00

7.7 KiB

Raw Blame History

Meeting Processor

Extract screen content from meeting recordings and merge with Whisper transcripts for better Claude summarization.

Overview

This tool enhances meeting transcripts by combining:

Audio transcription (from Whisper)
Screen content (OCR from screen shares)

The result is a rich, timestamped transcript that provides full context for AI summarization.

Installation

1. System Dependencies

Tesseract OCR (recommended):

# Ubuntu/Debian
sudo apt-get install tesseract-ocr

# macOS
brew install tesseract

# Arch Linux
sudo pacman -S tesseract

FFmpeg (for scene detection):

# Ubuntu/Debian
sudo apt-get install ffmpeg

# macOS
brew install ffmpeg

2. Python Dependencies

pip install -r requirements.txt

3. Whisper (for audio transcription)

pip install openai-whisper

4. Optional: Install Alternative OCR Engines

# EasyOCR (better for rotated/handwritten text)
pip install easyocr

# PaddleOCR (better for code/terminal screens)
pip install paddleocr

Quick Start

Recommended: Run Everything in One Command

python process_meeting.py samples/meeting.mkv --run-whisper

This will:

Run Whisper transcription (audio → text)
Extract frames every 5 seconds
Run OCR to extract screen text
Merge audio + screen content
Save everything to output/ folder

Alternative: Use Existing Whisper Transcript

If you already have a Whisper transcript:

python process_meeting.py samples/meeting.mkv --transcript output/meeting.json

Screen Content Only (No Audio)

python process_meeting.py samples/meeting.mkv

Usage Examples

Run with different Whisper models

# Tiny model (fastest, less accurate)
python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model tiny

# Small model (balanced)
python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model small

# Large model (slowest, most accurate)
python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model large

Extract frames at different intervals

# Every 10 seconds (with Whisper)
python process_meeting.py samples/meeting.mkv --run-whisper --interval 10

# Every 3 seconds (more detailed)
python process_meeting.py samples/meeting.mkv --run-whisper --interval 3

Use scene detection (smarter, fewer frames)

python process_meeting.py samples/meeting.mkv --run-whisper --scene-detection

Use different OCR engines

# EasyOCR (good for varied layouts)
python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine easyocr

# PaddleOCR (good for code/terminal)
python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine paddleocr

Extract frames only (no merging)

python process_meeting.py samples/meeting.mkv --extract-only

Custom output location

python process_meeting.py samples/meeting.mkv --run-whisper --output-dir my_outputs/

Enable verbose logging

# Show detailed debug information
python process_meeting.py samples/meeting.mkv --run-whisper --verbose

Output Files

All output files are saved to the output/ directory by default:

output/<video>_enhanced.txt - Enhanced transcript ready for Claude
output/<video>.json - Whisper transcript (if --run-whisper was used)
output/<video>_ocr.json - Raw OCR data with timestamps
frames/ - Extracted video frames (JPG files)

Workflow for Meeting Analysis

Complete Workflow (One Command!)

# Process everything in one step
python process_meeting.py samples/alo-intro1.mkv --run-whisper --scene-detection

# Output will be in output/alo-intro1_enhanced.txt

Traditional Workflow (Separate Steps)

# 1. Extract audio and transcribe with Whisper (optional, if not using --run-whisper)
whisper samples/alo-intro1.mkv --model base --output_format json --output_dir output

# 2. Process video to extract screen content
python process_meeting.py samples/alo-intro1.mkv \
    --transcript output/alo-intro1.json \
    --scene-detection

# 3. Use the enhanced transcript with Claude
# Copy the content from output/alo-intro1_enhanced.txt and paste into Claude

Example Prompt for Claude

Please summarize this meeting transcript. Pay special attention to:
1. Key decisions made
2. Action items
3. Technical details shown on screen
4. Any metrics or data presented

[Paste enhanced transcript here]

Command Reference

usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--run-whisper]
                          [--whisper-model {tiny,base,small,medium,large}]
                          [--output OUTPUT] [--output-dir OUTPUT_DIR]
                          [--frames-dir FRAMES_DIR] [--interval INTERVAL]
                          [--scene-detection]
                          [--ocr-engine {tesseract,easyocr,paddleocr}]
                          [--no-deduplicate] [--extract-only]
                          [--format {detailed,compact}] [--verbose]
                          video

Options:
  video                 Path to video file
  --transcript, -t      Path to Whisper transcript (JSON or TXT)
  --run-whisper         Run Whisper transcription before processing
  --whisper-model       Whisper model: tiny, base, small, medium, large (default: base)
  --output, -o          Output file for enhanced transcript
  --output-dir          Directory for output files (default: output/)
  --frames-dir          Directory to save extracted frames (default: frames/)
  --interval            Extract frame every N seconds (default: 5)
  --scene-detection     Use scene detection instead of interval extraction
  --ocr-engine          OCR engine: tesseract, easyocr, paddleocr (default: tesseract)
  --no-deduplicate      Disable text deduplication
  --extract-only        Only extract frames and OCR, skip transcript merging
  --format              Output format: detailed or compact (default: detailed)
  --verbose, -v         Enable verbose logging (DEBUG level)

Tips for Best Results

Scene Detection vs Interval

Scene detection: Better for presentations with distinct slides. More efficient.
Interval extraction: Better for continuous screen sharing (coding, browsing). More thorough.

OCR Engine Selection

Tesseract: Best for clean slides, documents, presentations. Fast and lightweight.
EasyOCR: Better for handwriting, rotated text, or varied fonts.
PaddleOCR: Excellent for code, terminal outputs, and mixed languages.

Deduplication

Enabled by default - removes similar consecutive frames
Disable with --no-deduplicate if slides change subtly

Troubleshooting

"pytesseract not installed"

pip install pytesseract
sudo apt-get install tesseract-ocr  # Don't forget system package!

"No frames extracted"

Check video file is valid: ffmpeg -i video.mkv
Try lower interval: --interval 3
Check disk space in frames directory

Poor OCR quality

Try different OCR engine
Check if video resolution is sufficient
Use --no-deduplicate to keep more frames

Scene detection not working

Fallback to interval extraction automatically
Ensure FFmpeg is installed
Try manual interval: --interval 5

Project Structure

meetus/
├── meetus/                  # Main package
│   ├── __init__.py
│   ├── frame_extractor.py   # Video frame extraction
│   ├── ocr_processor.py     # OCR processing
│   └── transcript_merger.py # Transcript merging
├── process_meeting.py       # Main CLI script
├── requirements.txt         # Python dependencies
└── README.md               # This file

License

For personal use.

7.7 KiB Raw Blame History