updated readme

This commit is contained in:
Mariano Gabriel
2025-12-04 20:24:52 -03:00
parent 331cccb15f
commit eb8b1f4f11

View File

@@ -46,25 +46,19 @@ For speaker diarization, you'll need a HuggingFace token with access to pyannote
## Quick Start
### Recommended: Embed Frames with Scene Detection
### Recommended Usage
```bash
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
```
This will:
1. Run Whisper transcription (audio → text)
2. Extract frames at scene changes (smarter than fixed intervals)
3. Embed frame references in the transcript for LLM analysis
1. Run WhisperX transcription with speaker diarization
2. Extract frames at scene changes (threshold 10 = moderately sensitive)
3. Create an enhanced transcript with frame file references
4. Save everything to `output/` folder
### With Speaker Diarization (WhisperX)
```bash
python process_meeting.py samples/meeting.mkv --run-whisper --diarize --embed-images --scene-detection
```
This uses WhisperX to identify different speakers in the transcript.
The `--embed-images` flag adds frame paths to the transcript (e.g., `Frame: frames/video_00257.jpg`), keeping the transcript small while frames stay in `frames/` folder for LLM access.
### Re-run with Cached Results
@@ -76,48 +70,38 @@ python process_meeting.py samples/meeting.mkv --embed-images
# Skip only specific cached items
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-frames
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-analysis
# Force complete reprocessing
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --no-cache
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --no-cache
```
## Usage Examples
### Scene Detection Options
```bash
# Default scene detection (threshold: 15)
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
# Default threshold (15)
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize
# More sensitive (more frames captured, threshold: 5)
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --scene-threshold 5
# More sensitive (more frames, threshold: 5)
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --diarize
# Less sensitive (fewer frames, threshold: 30)
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --scene-threshold 30
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 30 --diarize
```
### Fixed Interval Extraction (alternative to scene detection)
```bash
# Every 10 seconds
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --interval 10
python process_meeting.py samples/meeting.mkv --embed-images --interval 10 --diarize
# Every 3 seconds (more detailed)
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --interval 3
```
### Frame Quality Options
```bash
# Default quality (80)
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
# Lower quality for smaller files (60)
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --embed-quality 60
python process_meeting.py samples/meeting.mkv --embed-images --interval 3 --diarize
```
### Caching Examples
```bash
# First run - processes everything
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
# Iterate on scene threshold (reuse whisper transcript)
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis
@@ -126,17 +110,17 @@ python process_meeting.py samples/meeting.mkv --embed-images --scene-detection -
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
# Force complete reprocessing
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --no-cache
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --no-cache
```
### Custom output location
```bash
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --output-dir my_outputs/
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --output-dir my_outputs/
```
### Enable verbose logging
```bash
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --verbose
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --verbose
```
## Output Files
@@ -175,24 +159,17 @@ This allows you to iterate on scene detection thresholds without re-running Whis
### Complete Workflow (One Command!)
```bash
# Process everything in one step with scene detection
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
# With speaker diarization
python process_meeting.py samples/meeting.mkv --run-whisper --diarize --embed-images --scene-detection
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
```
### Typical Iterative Workflow
```bash
# First run - full processing
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
# Adjust scene threshold (keeps cached whisper transcript)
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --skip-cache-frames --skip-cache-analysis
# Try different frame quality
python process_meeting.py samples/meeting.mkv --embed-images --embed-quality 60 --skip-cache-frames --skip-cache-analysis
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis
```
### Example Prompt for Claude
@@ -223,11 +200,8 @@ usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--run-whisper]
Main Options:
video Path to video file
--run-whisper Run Whisper transcription before processing
--whisper-model Whisper model: tiny, base, small, medium, large (default: medium)
--diarize Use WhisperX with speaker diarization
--embed-images Embed frame references for LLM analysis (recommended)
--embed-quality JPEG quality for frames (default: 80)
--embed-images Add frame file references to transcript (recommended)
Frame Extraction:
--scene-detection Use FFmpeg scene detection (recommended)
@@ -241,6 +215,8 @@ Caching:
--skip-cache-analysis Re-run analysis only
Other:
--run-whisper Run Whisper (without diarization)
--whisper-model Whisper model: tiny, base, small, medium, large (default: medium)
--transcript, -t Path to existing Whisper transcript (JSON or TXT)
--output, -o Output file for enhanced transcript
--output-dir Directory for output files (default: output/)
@@ -262,10 +238,6 @@ Other:
- **Whisper** (`--run-whisper`): Standard transcription, fast
- **WhisperX** (`--run-whisper --diarize`): Adds speaker identification, requires HuggingFace token
### Frame Quality
- Default quality (80) works well for most cases
- Use `--embed-quality 60` for smaller files if storage is a concern
### Deduplication
- Enabled by default - removes similar consecutive frames
- Disable with `--no-deduplicate` if slides/screens change subtly