init commit

2025-10-19 22:17:38 -03:00
commit 93e0c06d38
10 changed files with 969 additions and 0 deletions
--- a/MARIAN.md
+++ b/MARIAN.md
@@ -0,0 +1,50 @@
+I have several recorded meetings that the interlocutor is showing his/her screen I had good success using openIAs whisper for transcribing the audio and asking claude to summarized it providing caveats and observations (no image involved, trasncript local, sumarization online) .. id like to include something (locally) to extract information from what is shown in the screen to provide more context for the summarization
+
+
+Great workflow! Adding visual context from screen shares will definitely improve the summarization quality. Here are some local options for extracting information from screen recordings:
+
+## OCR & Screen Analysis Tools
+
+**For extracting text from frames:**
+- **Tesseract OCR** - Open source, highly accurate, supports 100+ languages
+  - Use with Python: `pytesseract` wrapper
+  - Sample every N seconds of video to extract text
+  
+- **EasyOCR** - Python library, good for detecting text in various orientations
+  - More robust for presentations/slides than Tesseract
+  
+- **PaddleOCR** - Fast, accurate, works well with code/terminal screens
+
+**For video processing:**
+- **FFmpeg** - Extract frames at intervals (e.g., every 5-10 seconds or on scene changes)
+- **OpenCV** - Python library for video manipulation and frame extraction
+
+## Suggested Workflow
+
+```python
+# Rough approach:
+1. Use FFmpeg to detect scene changes or sample frames
+2. Run OCR on extracted frames
+3. Deduplicate similar text (consecutive slides often similar)
+4. Timestamp each extracted text segment
+5. Combine with Whisper transcript, aligned by timestamp
+6. Send enhanced transcript to Claude
+```
+
+## Tips for Better Results
+
+- **Scene detection**: Extract frames only when screen content changes significantly (saves processing time)
+- **Deduplication**: Use similarity algorithms to avoid duplicate text from static slides
+- **Screenshot types**: Different OCR tools excel at different content:
+  - Tesseract: Documents, slides
+  - EasyOCR: Handwriting, rotated text
+  - PaddleOCR: Code, mixed languages
+
+- **Format for Claude**: Structure as:
+  ```
+  [00:05] Speaker: "Let me show you the dashboard..."
+  [SCREEN: Dashboard showing Q4 Revenue: $2.3M, Growth: 45%]
+  [00:12] Speaker: "As you can see, we exceeded targets..."
+  ```
+
+Would you like specific code examples for any of these approaches?