commit 93e0c06d38ce9a06b25c6f04488f50f801456081 Author: Mariano Gabriel Date: Sun Oct 19 22:17:38 2025 -0300 init commit diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..ad7a3dd --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +!samples/.gitkeep +samples/* diff --git a/MARIAN.md b/MARIAN.md new file mode 100644 index 0000000..53e9137 --- /dev/null +++ b/MARIAN.md @@ -0,0 +1,50 @@ +I have several recorded meetings that the interlocutor is showing his/her screen I had good success using openIAs whisper for transcribing the audio and asking claude to summarized it providing caveats and observations (no image involved, trasncript local, sumarization online) .. id like to include something (locally) to extract information from what is shown in the screen to provide more context for the summarization + + +Great workflow! Adding visual context from screen shares will definitely improve the summarization quality. Here are some local options for extracting information from screen recordings: + +## OCR & Screen Analysis Tools + +**For extracting text from frames:** +- **Tesseract OCR** - Open source, highly accurate, supports 100+ languages + - Use with Python: `pytesseract` wrapper + - Sample every N seconds of video to extract text + +- **EasyOCR** - Python library, good for detecting text in various orientations + - More robust for presentations/slides than Tesseract + +- **PaddleOCR** - Fast, accurate, works well with code/terminal screens + +**For video processing:** +- **FFmpeg** - Extract frames at intervals (e.g., every 5-10 seconds or on scene changes) +- **OpenCV** - Python library for video manipulation and frame extraction + +## Suggested Workflow + +```python +# Rough approach: +1. Use FFmpeg to detect scene changes or sample frames +2. Run OCR on extracted frames +3. Deduplicate similar text (consecutive slides often similar) +4. Timestamp each extracted text segment +5. Combine with Whisper transcript, aligned by timestamp +6. Send enhanced transcript to Claude +``` + +## Tips for Better Results + +- **Scene detection**: Extract frames only when screen content changes significantly (saves processing time) +- **Deduplication**: Use similarity algorithms to avoid duplicate text from static slides +- **Screenshot types**: Different OCR tools excel at different content: + - Tesseract: Documents, slides + - EasyOCR: Handwriting, rotated text + - PaddleOCR: Code, mixed languages + +- **Format for Claude**: Structure as: + ``` + [00:05] Speaker: "Let me show you the dashboard..." + [SCREEN: Dashboard showing Q4 Revenue: $2.3M, Growth: 45%] + [00:12] Speaker: "As you can see, we exceeded targets..." + ``` + +Would you like specific code examples for any of these approaches? diff --git a/README.md b/README.md new file mode 100644 index 0000000..0732c59 --- /dev/null +++ b/README.md @@ -0,0 +1,239 @@ +# Meeting Processor + +Extract screen content from meeting recordings and merge with Whisper transcripts for better Claude summarization. + +## Overview + +This tool enhances meeting transcripts by combining: +- **Audio transcription** (from Whisper) +- **Screen content** (OCR from screen shares) + +The result is a rich, timestamped transcript that provides full context for AI summarization. + +## Installation + +### 1. System Dependencies + +**Tesseract OCR** (recommended): +```bash +# Ubuntu/Debian +sudo apt-get install tesseract-ocr + +# macOS +brew install tesseract + +# Arch Linux +sudo pacman -S tesseract +``` + +**FFmpeg** (for scene detection): +```bash +# Ubuntu/Debian +sudo apt-get install ffmpeg + +# macOS +brew install ffmpeg +``` + +### 2. Python Dependencies + +```bash +pip install -r requirements.txt +``` + +### 3. Optional: Install Alternative OCR Engines + +```bash +# EasyOCR (better for rotated/handwritten text) +pip install easyocr + +# PaddleOCR (better for code/terminal screens) +pip install paddleocr +``` + +## Quick Start + +### Basic Usage (Screen Content Only) + +```bash +python process_meeting.py samples/meeting.mkv +``` + +This will: +1. Extract frames every 5 seconds +2. Run OCR to extract screen text +3. Save enhanced transcript to `meeting_enhanced.txt` + +### With Whisper Transcript + +First, generate a Whisper transcript: +```bash +whisper samples/meeting.mkv --model base --output_format json +``` + +Then process with screen content: +```bash +python process_meeting.py samples/meeting.mkv --transcript samples/meeting.json +``` + +## Usage Examples + +### Extract frames at different intervals +```bash +# Every 10 seconds +python process_meeting.py samples/meeting.mkv --interval 10 + +# Every 3 seconds (more detailed) +python process_meeting.py samples/meeting.mkv --interval 3 +``` + +### Use scene detection (smarter, fewer frames) +```bash +python process_meeting.py samples/meeting.mkv --scene-detection +``` + +### Use different OCR engines +```bash +# EasyOCR (good for varied layouts) +python process_meeting.py samples/meeting.mkv --ocr-engine easyocr + +# PaddleOCR (good for code/terminal) +python process_meeting.py samples/meeting.mkv --ocr-engine paddleocr +``` + +### Extract frames only (no merging) +```bash +python process_meeting.py samples/meeting.mkv --extract-only +``` + +### Custom output location +```bash +python process_meeting.py samples/meeting.mkv --output my_meeting.txt --frames-dir my_frames/ +``` + +### Enable verbose logging +```bash +# Show detailed debug information +python process_meeting.py samples/meeting.mkv --verbose + +# Short form +python process_meeting.py samples/meeting.mkv -v +``` + +## Output Files + +After processing, you'll get: + +- **`