updated readme

2025-12-04 20:24:52 -03:00
parent 331cccb15f
commit eb8b1f4f11
1 changed files with 24 additions and 52 deletions
--- a/README.md
+++ b/README.md
@@ -46,25 +46,19 @@ For speaker diarization, you'll need a HuggingFace token with access to pyannote

 ## Quick Start

-### Recommended: Embed Frames with Scene Detection
+### Recommended Usage

 ```bash
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
 ```

 This will:
-1. Run Whisper transcription (audio → text)
-2. Extract frames at scene changes (smarter than fixed intervals)
-3. Embed frame references in the transcript for LLM analysis
+1. Run WhisperX transcription with speaker diarization
+2. Extract frames at scene changes (threshold 10 = moderately sensitive)
+3. Create an enhanced transcript with frame file references
 4. Save everything to `output/` folder

-### With Speaker Diarization (WhisperX)
-
-```bash
-python process_meeting.py samples/meeting.mkv --run-whisper --diarize --embed-images --scene-detection
-```
-
-This uses WhisperX to identify different speakers in the transcript.
+The `--embed-images` flag adds frame paths to the transcript (e.g., `Frame: frames/video_00257.jpg`), keeping the transcript small while frames stay in `frames/` folder for LLM access.

 ### Re-run with Cached Results

@@ -76,48 +70,38 @@ python process_meeting.py samples/meeting.mkv --embed-images
 # Skip only specific cached items
 python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-frames
 python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
-python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-analysis

 # Force complete reprocessing
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --no-cache
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --no-cache
 ```

 ## Usage Examples

 ### Scene Detection Options
 ```bash
-# Default scene detection (threshold: 15)
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
+# Default threshold (15)
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize

-# More sensitive (more frames captured, threshold: 5)
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --scene-threshold 5
+# More sensitive (more frames, threshold: 5)
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --diarize

 # Less sensitive (fewer frames, threshold: 30)
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --scene-threshold 30
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 30 --diarize
 ```

 ### Fixed Interval Extraction (alternative to scene detection)
 ```bash
 # Every 10 seconds
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --interval 10
+python process_meeting.py samples/meeting.mkv --embed-images --interval 10 --diarize

 # Every 3 seconds (more detailed)
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --interval 3
-```
-
-### Frame Quality Options
-```bash
-# Default quality (80)
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
-
-# Lower quality for smaller files (60)
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --embed-quality 60
+python process_meeting.py samples/meeting.mkv --embed-images --interval 3 --diarize
 ```

 ### Caching Examples
 ```bash
 # First run - processes everything
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize

 # Iterate on scene threshold (reuse whisper transcript)
 python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis
@@ -126,17 +110,17 @@ python process_meeting.py samples/meeting.mkv --embed-images --scene-detection -
 python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper

 # Force complete reprocessing
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --no-cache
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --no-cache
 ```

 ### Custom output location
 ```bash
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --output-dir my_outputs/
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --output-dir my_outputs/
 ```

 ### Enable verbose logging
 ```bash
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --verbose
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --verbose
 ```

 ## Output Files
@@ -175,24 +159,17 @@ This allows you to iterate on scene detection thresholds without re-running Whis
 ### Complete Workflow (One Command!)

 ```bash
-# Process everything in one step with scene detection
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
-
-# With speaker diarization
-python process_meeting.py samples/meeting.mkv --run-whisper --diarize --embed-images --scene-detection
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
 ```

 ### Typical Iterative Workflow

 ```bash
 # First run - full processing
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize

 # Adjust scene threshold (keeps cached whisper transcript)
-python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --skip-cache-frames --skip-cache-analysis
-
-# Try different frame quality
-python process_meeting.py samples/meeting.mkv --embed-images --embed-quality 60 --skip-cache-frames --skip-cache-analysis
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis
 ```

 ### Example Prompt for Claude
@@ -223,11 +200,8 @@ usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--run-whisper]

 Main Options:
  video                   Path to video file
-  --run-whisper           Run Whisper transcription before processing
-  --whisper-model         Whisper model: tiny, base, small, medium, large (default: medium)
  --diarize               Use WhisperX with speaker diarization
-  --embed-images          Embed frame references for LLM analysis (recommended)
-  --embed-quality         JPEG quality for frames (default: 80)
+  --embed-images          Add frame file references to transcript (recommended)

 Frame Extraction:
  --scene-detection       Use FFmpeg scene detection (recommended)
@@ -241,6 +215,8 @@ Caching:
  --skip-cache-analysis   Re-run analysis only

 Other:
+  --run-whisper           Run Whisper (without diarization)
+  --whisper-model         Whisper model: tiny, base, small, medium, large (default: medium)
  --transcript, -t        Path to existing Whisper transcript (JSON or TXT)
  --output, -o            Output file for enhanced transcript
  --output-dir            Directory for output files (default: output/)
@@ -262,10 +238,6 @@ Other:
 - **Whisper** (`--run-whisper`): Standard transcription, fast
 - **WhisperX** (`--run-whisper --diarize`): Adds speaker identification, requires HuggingFace token

-### Frame Quality
- Default quality (80) works well for most cases
- Use `--embed-quality 60` for smaller files if storage is a concern
-
 ### Deduplication
 - Enabled by default - removes similar consecutive frames
 - Disable with `--no-deduplicate` if slides/screens change subtly