add whisper to main command, ignore output files
This commit is contained in:
111
README.md
111
README.md
@@ -41,7 +41,13 @@ brew install ffmpeg
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 3. Optional: Install Alternative OCR Engines
|
||||
### 3. Whisper (for audio transcription)
|
||||
|
||||
```bash
|
||||
pip install openai-whisper
|
||||
```
|
||||
|
||||
### 4. Optional: Install Alternative OCR Engines
|
||||
|
||||
```bash
|
||||
# EasyOCR (better for rotated/handwritten text)
|
||||
@@ -53,52 +59,67 @@ pip install paddleocr
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Usage (Screen Content Only)
|
||||
### Recommended: Run Everything in One Command
|
||||
|
||||
```bash
|
||||
python process_meeting.py samples/meeting.mkv --run-whisper
|
||||
```
|
||||
|
||||
This will:
|
||||
1. Run Whisper transcription (audio → text)
|
||||
2. Extract frames every 5 seconds
|
||||
3. Run OCR to extract screen text
|
||||
4. Merge audio + screen content
|
||||
5. Save everything to `output/` folder
|
||||
|
||||
### Alternative: Use Existing Whisper Transcript
|
||||
|
||||
If you already have a Whisper transcript:
|
||||
```bash
|
||||
python process_meeting.py samples/meeting.mkv --transcript output/meeting.json
|
||||
```
|
||||
|
||||
### Screen Content Only (No Audio)
|
||||
|
||||
```bash
|
||||
python process_meeting.py samples/meeting.mkv
|
||||
```
|
||||
|
||||
This will:
|
||||
1. Extract frames every 5 seconds
|
||||
2. Run OCR to extract screen text
|
||||
3. Save enhanced transcript to `meeting_enhanced.txt`
|
||||
|
||||
### With Whisper Transcript
|
||||
|
||||
First, generate a Whisper transcript:
|
||||
```bash
|
||||
whisper samples/meeting.mkv --model base --output_format json
|
||||
```
|
||||
|
||||
Then process with screen content:
|
||||
```bash
|
||||
python process_meeting.py samples/meeting.mkv --transcript samples/meeting.json
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Run with different Whisper models
|
||||
```bash
|
||||
# Tiny model (fastest, less accurate)
|
||||
python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model tiny
|
||||
|
||||
# Small model (balanced)
|
||||
python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model small
|
||||
|
||||
# Large model (slowest, most accurate)
|
||||
python process_meeting.py samples/meeting.mkv --run-whisper --whisper-model large
|
||||
```
|
||||
|
||||
### Extract frames at different intervals
|
||||
```bash
|
||||
# Every 10 seconds
|
||||
python process_meeting.py samples/meeting.mkv --interval 10
|
||||
# Every 10 seconds (with Whisper)
|
||||
python process_meeting.py samples/meeting.mkv --run-whisper --interval 10
|
||||
|
||||
# Every 3 seconds (more detailed)
|
||||
python process_meeting.py samples/meeting.mkv --interval 3
|
||||
python process_meeting.py samples/meeting.mkv --run-whisper --interval 3
|
||||
```
|
||||
|
||||
### Use scene detection (smarter, fewer frames)
|
||||
```bash
|
||||
python process_meeting.py samples/meeting.mkv --scene-detection
|
||||
python process_meeting.py samples/meeting.mkv --run-whisper --scene-detection
|
||||
```
|
||||
|
||||
### Use different OCR engines
|
||||
```bash
|
||||
# EasyOCR (good for varied layouts)
|
||||
python process_meeting.py samples/meeting.mkv --ocr-engine easyocr
|
||||
python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine easyocr
|
||||
|
||||
# PaddleOCR (good for code/terminal)
|
||||
python process_meeting.py samples/meeting.mkv --ocr-engine paddleocr
|
||||
python process_meeting.py samples/meeting.mkv --run-whisper --ocr-engine paddleocr
|
||||
```
|
||||
|
||||
### Extract frames only (no merging)
|
||||
@@ -108,41 +129,48 @@ python process_meeting.py samples/meeting.mkv --extract-only
|
||||
|
||||
### Custom output location
|
||||
```bash
|
||||
python process_meeting.py samples/meeting.mkv --output my_meeting.txt --frames-dir my_frames/
|
||||
python process_meeting.py samples/meeting.mkv --run-whisper --output-dir my_outputs/
|
||||
```
|
||||
|
||||
### Enable verbose logging
|
||||
```bash
|
||||
# Show detailed debug information
|
||||
python process_meeting.py samples/meeting.mkv --verbose
|
||||
|
||||
# Short form
|
||||
python process_meeting.py samples/meeting.mkv -v
|
||||
python process_meeting.py samples/meeting.mkv --run-whisper --verbose
|
||||
```
|
||||
|
||||
## Output Files
|
||||
|
||||
After processing, you'll get:
|
||||
All output files are saved to the `output/` directory by default:
|
||||
|
||||
- **`<video>_enhanced.txt`** - Enhanced transcript ready for Claude
|
||||
- **`<video>_ocr.json`** - Raw OCR data with timestamps
|
||||
- **`output/<video>_enhanced.txt`** - Enhanced transcript ready for Claude
|
||||
- **`output/<video>.json`** - Whisper transcript (if `--run-whisper` was used)
|
||||
- **`output/<video>_ocr.json`** - Raw OCR data with timestamps
|
||||
- **`frames/`** - Extracted video frames (JPG files)
|
||||
|
||||
## Workflow for Meeting Analysis
|
||||
|
||||
### Complete Workflow
|
||||
### Complete Workflow (One Command!)
|
||||
|
||||
```bash
|
||||
# 1. Extract audio and transcribe with Whisper
|
||||
whisper samples/alo-intro1.mkv --model base --output_format json
|
||||
# Process everything in one step
|
||||
python process_meeting.py samples/alo-intro1.mkv --run-whisper --scene-detection
|
||||
|
||||
# Output will be in output/alo-intro1_enhanced.txt
|
||||
```
|
||||
|
||||
### Traditional Workflow (Separate Steps)
|
||||
|
||||
```bash
|
||||
# 1. Extract audio and transcribe with Whisper (optional, if not using --run-whisper)
|
||||
whisper samples/alo-intro1.mkv --model base --output_format json --output_dir output
|
||||
|
||||
# 2. Process video to extract screen content
|
||||
python process_meeting.py samples/alo-intro1.mkv \
|
||||
--transcript samples/alo-intro1.json \
|
||||
--transcript output/alo-intro1.json \
|
||||
--scene-detection
|
||||
|
||||
# 3. Use the enhanced transcript with Claude
|
||||
# Copy the content from alo-intro1_enhanced.txt and paste into Claude
|
||||
# Copy the content from output/alo-intro1_enhanced.txt and paste into Claude
|
||||
```
|
||||
|
||||
### Example Prompt for Claude
|
||||
@@ -160,7 +188,9 @@ Please summarize this meeting transcript. Pay special attention to:
|
||||
## Command Reference
|
||||
|
||||
```
|
||||
usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--output OUTPUT]
|
||||
usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--run-whisper]
|
||||
[--whisper-model {tiny,base,small,medium,large}]
|
||||
[--output OUTPUT] [--output-dir OUTPUT_DIR]
|
||||
[--frames-dir FRAMES_DIR] [--interval INTERVAL]
|
||||
[--scene-detection]
|
||||
[--ocr-engine {tesseract,easyocr,paddleocr}]
|
||||
@@ -171,7 +201,10 @@ usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--output OUTPUT]
|
||||
Options:
|
||||
video Path to video file
|
||||
--transcript, -t Path to Whisper transcript (JSON or TXT)
|
||||
--run-whisper Run Whisper transcription before processing
|
||||
--whisper-model Whisper model: tiny, base, small, medium, large (default: base)
|
||||
--output, -o Output file for enhanced transcript
|
||||
--output-dir Directory for output files (default: output/)
|
||||
--frames-dir Directory to save extracted frames (default: frames/)
|
||||
--interval Extract frame every N seconds (default: 5)
|
||||
--scene-detection Use scene detection instead of interval extraction
|
||||
|
||||
Reference in New Issue
Block a user