updated readme

2025-12-04 20:24:52 -03:00
parent 331cccb15f
commit eb8b1f4f11
1 changed files with 24 additions and 52 deletions
--- a/README.md
+++ b/README.md
@@ -46,25 +46,19 @@ For speaker diarization, you'll need a HuggingFace token with access to pyannote
 ## Quick Start
-### Recommended: Embed Frames with Scene Detection
+### Recommended Usage
 ```bash
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
 ```
 This will:
-1. Run Whisper transcription (audio → text)
+1. Run WhisperX transcription with speaker diarization
-2. Extract frames at scene changes (smarter than fixed intervals)
+2. Extract frames at scene changes (threshold 10 = moderately sensitive)
-3. Embed frame references in the transcript for LLM analysis
+3. Create an enhanced transcript with frame file references
 4. Save everything to `output/` folder
-### With Speaker Diarization (WhisperX)
+The `--embed-images` flag adds frame paths to the transcript (e.g., `Frame: frames/video_00257.jpg`), keeping the transcript small while frames stay in `frames/` folder for LLM access.
 ```bash
 python process_meeting.py samples/meeting.mkv --run-whisper --diarize --embed-images --scene-detection
 ```
 This uses WhisperX to identify different speakers in the transcript.
 ### Re-run with Cached Results
@@ -76,48 +70,38 @@ python process_meeting.py samples/meeting.mkv --embed-images
 # Skip only specific cached items
 python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-frames
 python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
 python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-analysis
 # Force complete reprocessing
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --no-cache
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --no-cache
 ```
 ## Usage Examples
 ### Scene Detection Options
 ```bash
-# Default scene detection (threshold: 15)
+# Default threshold (15)
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize
-# More sensitive (more frames captured, threshold: 5)
+# More sensitive (more frames, threshold: 5)
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --scene-threshold 5
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --diarize
 # Less sensitive (fewer frames, threshold: 30)
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --scene-threshold 30
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 30 --diarize
 ```
 ### Fixed Interval Extraction (alternative to scene detection)
 ```bash
 # Every 10 seconds
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --interval 10
+python process_meeting.py samples/meeting.mkv --embed-images --interval 10 --diarize
 # Every 3 seconds (more detailed)
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --interval 3
+python process_meeting.py samples/meeting.mkv --embed-images --interval 3 --diarize
 ```
 ### Frame Quality Options
 ```bash
 # Default quality (80)
 python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
 # Lower quality for smaller files (60)
 python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --embed-quality 60
 ```
 ### Caching Examples
 ```bash
 # First run - processes everything
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
 # Iterate on scene threshold (reuse whisper transcript)
 python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis
@@ -126,17 +110,17 @@ python process_meeting.py samples/meeting.mkv --embed-images --scene-detection -
 python process_meeting.py samples/meeting.mkv --embed-images --skip-cache-whisper
 # Force complete reprocessing
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --no-cache
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --no-cache
 ```
 ### Custom output location
 ```bash
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --output-dir my_outputs/
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --output-dir my_outputs/
 ```
 ### Enable verbose logging
 ```bash
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection --verbose
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --diarize --verbose
 ```
 ## Output Files
@@ -175,24 +159,17 @@ This allows you to iterate on scene detection thresholds without re-running Whis
 ### Complete Workflow (One Command!)
 ```bash
-# Process everything in one step with scene detection
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
 python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
 # With speaker diarization
 python process_meeting.py samples/meeting.mkv --run-whisper --diarize --embed-images --scene-detection
 ```
 ### Typical Iterative Workflow
 ```bash
 # First run - full processing
-python process_meeting.py samples/meeting.mkv --run-whisper --embed-images --scene-detection
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --diarize
 # Adjust scene threshold (keeps cached whisper transcript)
-python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 10 --skip-cache-frames --skip-cache-analysis
+python process_meeting.py samples/meeting.mkv --embed-images --scene-detection --scene-threshold 5 --skip-cache-frames --skip-cache-analysis
 # Try different frame quality
 python process_meeting.py samples/meeting.mkv --embed-images --embed-quality 60 --skip-cache-frames --skip-cache-analysis
 ```
 ### Example Prompt for Claude
@@ -223,11 +200,8 @@ usage: process_meeting.py [-h] [--transcript TRANSCRIPT] [--run-whisper]
 Main Options:
  video                   Path to video file
  --run-whisper           Run Whisper transcription before processing
  --whisper-model         Whisper model: tiny, base, small, medium, large (default: medium)
  --diarize               Use WhisperX with speaker diarization
-  --embed-images          Embed frame references for LLM analysis (recommended)
+  --embed-images          Add frame file references to transcript (recommended)
  --embed-quality         JPEG quality for frames (default: 80)
 Frame Extraction:
  --scene-detection       Use FFmpeg scene detection (recommended)
@@ -241,6 +215,8 @@ Caching:
  --skip-cache-analysis   Re-run analysis only
 Other:
  --run-whisper           Run Whisper (without diarization)
  --whisper-model         Whisper model: tiny, base, small, medium, large (default: medium)
  --transcript, -t        Path to existing Whisper transcript (JSON or TXT)
  --output, -o            Output file for enhanced transcript
  --output-dir            Directory for output files (default: output/)
@@ -262,10 +238,6 @@ Other:
 - **Whisper** (`--run-whisper`): Standard transcription, fast
 - **WhisperX** (`--run-whisper --diarize`): Adds speaker identification, requires HuggingFace token
 ### Frame Quality
 - Default quality (80) works well for most cases
 - Use `--embed-quality 60` for smaller files if storage is a concern
 ### Deduplication
 - Enabled by default - removes similar consecutive frames
 - Disable with `--no-deduplicate` if slides/screens change subtly