updated readme

This commit is contained in:
Mariano Gabriel
2025-12-04 20:24:52 -03:00
parent 331cccb15f
commit eb8b1f4f11

View File

@@ -46,25 +46,19 @@ For speaker diarization, you'll need a HuggingFace token with access to pyannote
## Quick Start ## Quick Start
### Recommended: Embed Frames with Scene Detection ### Recommended Usage
```bash ```bash
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
``` ```
This will: This will:
1. Run Whisper transcription (audio → text) 1. Run WhisperX transcription with speaker diarization
2. Extract frames at scene changes (smarter than fixed intervals) 2. Extract frames at scene changes (threshold 10 = moderately sensitive)
3. Embed frame references in the transcript for LLM analysis 3. Create an enhanced transcript with frame file references
4. Save everything to `output/` folder 4. Save everything to `output/` folder
### With Speaker Diarization (WhisperX) The `--embed-images` flag adds frame paths to the transcript (e.g., `Frame: frames/video_00257.jpg`), keeping the transcript small while frames stay in `frames/` folder for LLM access.
```bash
python process_meeting.py samples/meeting.mkv --run-whisper --diarize --embed-images --scene-detection
```
This uses WhisperX to identify different speakers in the transcript.
### Re-run with Cached Results ### Re-run with Cached Results
@@ -76,48 +70,38 @@ python process_meeting.py samples/meeting.mkv --embed-images
# Skip only specific cached items # Skip only specific cached items
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-frames python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-frames
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-analysis
# Force complete reprocessing # Force complete reprocessing
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --no-cache python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --no-cache
``` ```
## Usage Examples ## Usage Examples
### Scene Detection Options ### Scene Detection Options
```bash ```bash
# Default scene detection (threshold: 15) # Default threshold (15)
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize
# More sensitive (more frames captured, threshold: 5) # More sensitive (more frames, threshold: 5)
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --scene-threshold 5 python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --diarize
# Less sensitive (fewer frames, threshold: 30) # Less sensitive (fewer frames, threshold: 30)
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --scene-threshold 30 python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 30 --diarize
``` ```
### Fixed Interval Extraction (alternative to scene detection) ### Fixed Interval Extraction (alternative to scene detection)
```bash ```bash
# Every 10 seconds # Every 10 seconds
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --interval 10 python process_meeting.py samples/meeting.mkv --embed-images --interval 10 --diarize
# Every 3 seconds (more detailed) # Every 3 seconds (more detailed)
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --interval 3 python process_meeting.py samples/meeting.mkv --embed-images --interval 3 --diarize
```
### Frame Quality Options
```bash
# Default quality (80)
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
# Lower quality for smaller files (60)
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --embed-quality 60
``` ```
### Caching Examples ### Caching Examples
```bash ```bash
# First run - processes everything # First run - processes everything
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
# Iterate on scene threshold (reuse whisper transcript) # Iterate on scene threshold (reuse whisper transcript)
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis
@@ -126,17 +110,17 @@ python process_meeting.py samples/meeting.mkv --embed-images --scene-detection -
python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
# Force complete reprocessing # Force complete reprocessing
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --no-cache python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --no-cache
``` ```
### Custom output location ### Custom output location
```bash ```bash
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --output-dir my_outputs/ python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --output-dir my_outputs/
``` ```
### Enable verbose logging ### Enable verbose logging
```bash ```bash
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --verbose python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --verbose
``` ```
## Output Files ## Output Files
@@ -175,24 +159,17 @@ This allows you to iterate on scene detection thresholds without re-running Whis
### Complete Workflow (One Command!) ### Complete Workflow (One Command!)
```bash ```bash
# Process everything in one step with scene detection python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
# With speaker diarization
python process_meeting.py samples/meeting.mkv --run-whisper --diarize --embed-images --scene-detection
``` ```
### Typical Iterative Workflow ### Typical Iterative Workflow
```bash ```bash
# First run - full processing # First run - full processing
python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
# Adjust scene threshold (keeps cached whisper transcript) # Adjust scene threshold (keeps cached whisper transcript)
python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --skip-cache-frames --skip-cache-analysis python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis
# Try different frame quality
python process_meeting.py samples/meeting.mkv --embed-images --embed-quality 60 --skip-cache-frames --skip-cache-analysis
``` ```
### Example Prompt for Claude ### Example Prompt for Claude
@@ -223,11 +200,8 @@ usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--run-whisper]
Main Options: Main Options:
video Path to video file video Path to video file
--run-whisper Run Whisper transcription before processing
--whisper-model Whisper model: tiny, base, small, medium, large (default: medium)
--diarize Use WhisperX with speaker diarization --diarize Use WhisperX with speaker diarization
--embed-images Embed frame references for LLM analysis (recommended) --embed-images Add frame file references to transcript (recommended)
--embed-quality JPEG quality for frames (default: 80)
Frame Extraction: Frame Extraction:
--scene-detection Use FFmpeg scene detection (recommended) --scene-detection Use FFmpeg scene detection (recommended)
@@ -241,6 +215,8 @@ Caching:
--skip-cache-analysis Re-run analysis only --skip-cache-analysis Re-run analysis only
Other: Other:
--run-whisper Run Whisper (without diarization)
--whisper-model Whisper model: tiny, base, small, medium, large (default: medium)
--transcript, -t Path to existing Whisper transcript (JSON or TXT) --transcript, -t Path to existing Whisper transcript (JSON or TXT)
--output, -o Output file for enhanced transcript --output, -o Output file for enhanced transcript
--output-dir Directory for output files (default: output/) --output-dir Directory for output files (default: output/)
@@ -262,10 +238,6 @@ Other:
- **Whisper** (`--run-whisper`): Standard transcription, fast - **Whisper** (`--run-whisper`): Standard transcription, fast
- **WhisperX** (`--run-whisper --diarize`): Adds speaker identification, requires HuggingFace token - **WhisperX** (`--run-whisper --diarize`): Adds speaker identification, requires HuggingFace token
### Frame Quality
- Default quality (80) works well for most cases
- Use `--embed-quality 60` for smaller files if storage is a concern
### Deduplication ### Deduplication
- Enabled by default - removes similar consecutive frames - Enabled by default - removes similar consecutive frames
- Disable with `--no-deduplicate` if slides/screens change subtly - Disable with `--no-deduplicate` if slides/screens change subtly