add whisper to main command, ignore output files

2025-10-19 22:49:36 -03:00
parent 93e0c06d38
commit ae89564373
5 changed files with 183 additions and 50 deletions
--- a/README.md
+++ b/README.md
@@ -41,7 +41,13 @@ brew install ffmpeg
 pip install -r requirements.txt
 ```

-### 3. Optional: Install Alternative OCR Engines
+### 3. Whisper (for audio transcription)
+
+```bash
+pip install openai-whisper
+```
+
+### 4. Optional: Install Alternative OCR Engines

 ```bash
 # EasyOCR (better for rotated/handwritten text)
@@ -53,52 +59,67 @@ pip install paddleocr

 ## Quick Start

-### Basic Usage (Screen Content Only)
+### Recommended: Run Everything in One Command
+
+```bash
+python process_meeting.py samples/meeting.mkv --run-whisper
+```
+
+This will:
+1. Run Whisper transcription (audio → text)
+2. Extract frames every 5 seconds
+3. Run OCR to extract screen text
+4. Merge audio + screen content
+5. Save everything to `output/` folder
+
+### Alternative: Use Existing Whisper Transcript
+
+If you already have a Whisper transcript:
+```bash
+python process_meeting.py samples/meeting.mkv --transcript output/meeting.json
+```
+
+### Screen Content Only (No Audio)

 ```bash
 python process_meeting.py samples/meeting.mkv
 ```

-This will:
-1. Extract frames every 5 seconds
-2. Run OCR to extract screen text
-3. Save enhanced transcript to `meeting_enhanced.txt`
-
-### With Whisper Transcript
-
-First, generate a Whisper transcript:
-```bash
-whisper samples/meeting.mkv --model base --output_format json
-```
-
-Then process with screen content:
-```bash
-python process_meeting.py samples/meeting.mkv --transcript samples/meeting.json
-```
-
 ## Usage Examples

+### Run with different Whisper models
+```bash
+# Tiny model (fastest, less accurate)
+python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model tiny
+
+# Small model (balanced)
+python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model small
+
+# Large model (slowest, most accurate)
+python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model large
+```
+
 ### Extract frames at different intervals
 ```bash
-# Every 10 seconds
-python process_meeting.py samples/meeting.mkv --interval 10
+# Every 10 seconds (with Whisper)
+python process_meeting.py samples/meeting.mkv --run-whisper --interval 10

 # Every 3 seconds (more detailed)
-python process_meeting.py samples/meeting.mkv --interval 3
+python process_meeting.py samples/meeting.mkv --run-whisper --interval 3
 ```

 ### Use scene detection (smarter, fewer frames)
 ```bash
-python process_meeting.py samples/meeting.mkv --scene-detection
+python process_meeting.py samples/meeting.mkv --run-whisper --scene-detection
 ```

 ### Use different OCR engines
 ```bash
 # EasyOCR (good for varied layouts)
-python process_meeting.py samples/meeting.mkv --ocr-engine easyocr
+python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine easyocr

 # PaddleOCR (good for code/terminal)
-python process_meeting.py samples/meeting.mkv --ocr-engine paddleocr
+python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine paddleocr
 ```

 ### Extract frames only (no merging)
@@ -108,41 +129,48 @@ python process_meeting.py samples/meeting.mkv --extract-only

 ### Custom output location
 ```bash
-python process_meeting.py samples/meeting.mkv --output my_meeting.txt --frames-dir my_frames/
+python process_meeting.py samples/meeting.mkv --run-whisper --output-dir my_outputs/
 ```

 ### Enable verbose logging
 ```bash
 # Show detailed debug information
-python process_meeting.py samples/meeting.mkv --verbose
-
-# Short form
-python process_meeting.py samples/meeting.mkv -v
+python process_meeting.py samples/meeting.mkv --run-whisper --verbose
 ```

 ## Output Files

-After processing, you'll get:
+All output files are saved to the `output/` directory by default:

- **`<video>_enhanced.txt`** - Enhanced transcript ready for Claude
- **`<video>_ocr.json`** - Raw OCR data with timestamps
+- **`output/<video>_enhanced.txt`** - Enhanced transcript ready for Claude
+- **`output/<video>.json`** - Whisper transcript (if `--run-whisper` was used)
+- **`output/<video>_ocr.json`** - Raw OCR data with timestamps
 - **`frames/`** - Extracted video frames (JPG files)

 ## Workflow for Meeting Analysis

-### Complete Workflow
+### Complete Workflow (One Command!)

 ```bash
-# 1. Extract audio and transcribe with Whisper
-whisper samples/alo-intro1.mkv --model base --output_format json
+# Process everything in one step
+python process_meeting.py samples/alo-intro1.mkv --run-whisper --scene-detection
+
+# Output will be in output/alo-intro1_enhanced.txt
+```
+
+### Traditional Workflow (Separate Steps)
+
+```bash
+# 1. Extract audio and transcribe with Whisper (optional, if not using --run-whisper)
+whisper samples/alo-intro1.mkv --model base --output_format json --output_dir output

 # 2. Process video to extract screen content
 python process_meeting.py samples/alo-intro1.mkv \
-    --transcript samples/alo-intro1.json \
+    --transcript output/alo-intro1.json \
    --scene-detection

 # 3. Use the enhanced transcript with Claude
-# Copy the content from alo-intro1_enhanced.txt and paste into Claude
+# Copy the content from output/alo-intro1_enhanced.txt and paste into Claude
 ```

 ### Example Prompt for Claude
@@ -160,7 +188,9 @@ Please summarize this meeting transcript. Pay special attention to:
 ## Command Reference

 ```
-usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--output OUTPUT]
+usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--run-whisper]
+                          [--whisper-model {tiny,base,small,medium,large}]
+                          [--output OUTPUT] [--output-dir OUTPUT_DIR]
                          [--frames-dir FRAMES_DIR] [--interval INTERVAL]
                          [--scene-detection]
                          [--ocr-engine {tesseract,easyocr,paddleocr}]
@@ -171,7 +201,10 @@ usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--output OUTPUT]
 Options:
  video                 Path to video file
  --transcript, -t      Path to Whisper transcript (JSON or TXT)
+  --run-whisper         Run Whisper transcription before processing
+  --whisper-model       Whisper model: tiny, base, small, medium, large (default: base)
  --output, -o          Output file for enhanced transcript
+  --output-dir          Directory for output files (default: output/)
  --frames-dir          Directory to save extracted frames (default: frames/)
  --interval            Extract frame every N seconds (default: 5)
  --scene-detection     Use scene detection instead of interval extraction