updated readme
This commit is contained in:
76
README.md
76
README.md
@@ -46,25 +46,19 @@ For speaker diarization, you'll need a HuggingFace token with access to pyannote
|
|||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
### Recommended: Embed Frames with Scene Detection
|
### Recommended Usage
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
|
||||||
```
|
```
|
||||||
|
|
||||||
This will:
|
This will:
|
||||||
1. Run Whisper transcription (audio → text)
|
1. Run WhisperX transcription with speaker diarization
|
||||||
2. Extract frames at scene changes (smarter than fixed intervals)
|
2. Extract frames at scene changes (threshold 10 = moderately sensitive)
|
||||||
3. Embed frame references in the transcript for LLM analysis
|
3. Create an enhanced transcript with frame file references
|
||||||
4. Save everything to `output/` folder
|
4. Save everything to `output/` folder
|
||||||
|
|
||||||
### With Speaker Diarization (WhisperX)
|
The `--embed-images` flag adds frame paths to the transcript (e.g., `Frame: frames/video_00257.jpg`), keeping the transcript small while frames stay in `frames/` folder for LLM access.
|
||||||
|
|
||||||
```bash
|
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --diarize --embed-images --scene-detection
|
|
||||||
```
|
|
||||||
|
|
||||||
This uses WhisperX to identify different speakers in the transcript.
|
|
||||||
|
|
||||||
### Re-run with Cached Results
|
### Re-run with Cached Results
|
||||||
|
|
||||||
@@ -76,48 +70,38 @@ python process_meeting.py samples/meeting.mkv --embed-images
|
|||||||
# Skip only specific cached items
|
# Skip only specific cached items
|
||||||
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-frames
|
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-frames
|
||||||
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
|
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
|
||||||
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-analysis
|
|
||||||
|
|
||||||
# Force complete reprocessing
|
# Force complete reprocessing
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --no-cache
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --no-cache
|
||||||
```
|
```
|
||||||
|
|
||||||
## Usage Examples
|
## Usage Examples
|
||||||
|
|
||||||
### Scene Detection Options
|
### Scene Detection Options
|
||||||
```bash
|
```bash
|
||||||
# Default scene detection (threshold: 15)
|
# Default threshold (15)
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize
|
||||||
|
|
||||||
# More sensitive (more frames captured, threshold: 5)
|
# More sensitive (more frames, threshold: 5)
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --scene-threshold 5
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --diarize
|
||||||
|
|
||||||
# Less sensitive (fewer frames, threshold: 30)
|
# Less sensitive (fewer frames, threshold: 30)
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --scene-threshold 30
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 30 --diarize
|
||||||
```
|
```
|
||||||
|
|
||||||
### Fixed Interval Extraction (alternative to scene detection)
|
### Fixed Interval Extraction (alternative to scene detection)
|
||||||
```bash
|
```bash
|
||||||
# Every 10 seconds
|
# Every 10 seconds
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --interval 10
|
python process_meeting.py samples/meeting.mkv --embed-images --interval 10 --diarize
|
||||||
|
|
||||||
# Every 3 seconds (more detailed)
|
# Every 3 seconds (more detailed)
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --interval 3
|
python process_meeting.py samples/meeting.mkv --embed-images --interval 3 --diarize
|
||||||
```
|
|
||||||
|
|
||||||
### Frame Quality Options
|
|
||||||
```bash
|
|
||||||
# Default quality (80)
|
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
|
|
||||||
|
|
||||||
# Lower quality for smaller files (60)
|
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --embed-quality 60
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Caching Examples
|
### Caching Examples
|
||||||
```bash
|
```bash
|
||||||
# First run - processes everything
|
# First run - processes everything
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
|
||||||
|
|
||||||
# Iterate on scene threshold (reuse whisper transcript)
|
# Iterate on scene threshold (reuse whisper transcript)
|
||||||
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis
|
||||||
@@ -126,17 +110,17 @@ python process_meeting.py samples/meeting.mkv --embed-images --scene-detection -
|
|||||||
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
|
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
|
||||||
|
|
||||||
# Force complete reprocessing
|
# Force complete reprocessing
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --no-cache
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --no-cache
|
||||||
```
|
```
|
||||||
|
|
||||||
### Custom output location
|
### Custom output location
|
||||||
```bash
|
```bash
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --output-dir my_outputs/
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --output-dir my_outputs/
|
||||||
```
|
```
|
||||||
|
|
||||||
### Enable verbose logging
|
### Enable verbose logging
|
||||||
```bash
|
```bash
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --verbose
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --verbose
|
||||||
```
|
```
|
||||||
|
|
||||||
## Output Files
|
## Output Files
|
||||||
@@ -175,24 +159,17 @@ This allows you to iterate on scene detection thresholds without re-running Whis
|
|||||||
### Complete Workflow (One Command!)
|
### Complete Workflow (One Command!)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Process everything in one step with scene detection
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
|
|
||||||
|
|
||||||
# With speaker diarization
|
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --diarize --embed-images --scene-detection
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Typical Iterative Workflow
|
### Typical Iterative Workflow
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# First run - full processing
|
# First run - full processing
|
||||||
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
|
||||||
|
|
||||||
# Adjust scene threshold (keeps cached whisper transcript)
|
# Adjust scene threshold (keeps cached whisper transcript)
|
||||||
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --skip-cache-frames --skip-cache-analysis
|
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis
|
||||||
|
|
||||||
# Try different frame quality
|
|
||||||
python process_meeting.py samples/meeting.mkv --embed-images --embed-quality 60 --skip-cache-frames --skip-cache-analysis
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Example Prompt for Claude
|
### Example Prompt for Claude
|
||||||
@@ -223,11 +200,8 @@ usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--run-whisper]
|
|||||||
|
|
||||||
Main Options:
|
Main Options:
|
||||||
video Path to video file
|
video Path to video file
|
||||||
--run-whisper Run Whisper transcription before processing
|
|
||||||
--whisper-model Whisper model: tiny, base, small, medium, large (default: medium)
|
|
||||||
--diarize Use WhisperX with speaker diarization
|
--diarize Use WhisperX with speaker diarization
|
||||||
--embed-images Embed frame references for LLM analysis (recommended)
|
--embed-images Add frame file references to transcript (recommended)
|
||||||
--embed-quality JPEG quality for frames (default: 80)
|
|
||||||
|
|
||||||
Frame Extraction:
|
Frame Extraction:
|
||||||
--scene-detection Use FFmpeg scene detection (recommended)
|
--scene-detection Use FFmpeg scene detection (recommended)
|
||||||
@@ -241,6 +215,8 @@ Caching:
|
|||||||
--skip-cache-analysis Re-run analysis only
|
--skip-cache-analysis Re-run analysis only
|
||||||
|
|
||||||
Other:
|
Other:
|
||||||
|
--run-whisper Run Whisper (without diarization)
|
||||||
|
--whisper-model Whisper model: tiny, base, small, medium, large (default: medium)
|
||||||
--transcript, -t Path to existing Whisper transcript (JSON or TXT)
|
--transcript, -t Path to existing Whisper transcript (JSON or TXT)
|
||||||
--output, -o Output file for enhanced transcript
|
--output, -o Output file for enhanced transcript
|
||||||
--output-dir Directory for output files (default: output/)
|
--output-dir Directory for output files (default: output/)
|
||||||
@@ -262,10 +238,6 @@ Other:
|
|||||||
- **Whisper** (`--run-whisper`): Standard transcription, fast
|
- **Whisper** (`--run-whisper`): Standard transcription, fast
|
||||||
- **WhisperX** (`--run-whisper --diarize`): Adds speaker identification, requires HuggingFace token
|
- **WhisperX** (`--run-whisper --diarize`): Adds speaker identification, requires HuggingFace token
|
||||||
|
|
||||||
### Frame Quality
|
|
||||||
- Default quality (80) works well for most cases
|
|
||||||
- Use `--embed-quality 60` for smaller files if storage is a concern
|
|
||||||
|
|
||||||
### Deduplication
|
### Deduplication
|
||||||
- Enabled by default - removes similar consecutive frames
|
- Enabled by default - removes similar consecutive frames
|
||||||
- Disable with `--no-deduplicate` if slides/screens change subtly
|
- Disable with `--no-deduplicate` if slides/screens change subtly
|
||||||
|
|||||||
Reference in New Issue
Block a user