refactor

2025-10-20 00:03:41 -03:00
parent a999bc9093
commit cd7b0aed07
11 changed files with 776 additions and 312 deletions
--- a/README.md
+++ b/README.md
@@ -184,22 +184,53 @@ python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --verbo

 ## Output Files

-All output files are saved to the `output/` directory by default:
+Each video gets its own timestamped output directory:

- **`output/<video>_enhanced.txt`** - Enhanced transcript ready for AI summarization
- **`output/<video>.json`** - Whisper transcript (if `--run-whisper` was used)
- **`output/<video>_vision.json`** - Vision analysis results with timestamps (if `--use-vision`)
- **`output/<video>_ocr.json`** - OCR results with timestamps (if using OCR)
- **`frames/`** - Extracted video frames (JPG files)
+```
+output/
+└── 20241019_143022-meeting/
+    ├── manifest.json                    # Processing configuration
+    ├── meeting_enhanced.txt             # Enhanced transcript for AI
+    ├── meeting.json                     # Whisper transcript
+    ├── meeting_vision.json              # Vision analysis results
+    └── frames/                          # Extracted video frames
+        ├── frame_00001_5.00s.jpg
+        ├── frame_00002_10.00s.jpg
+        └── ...
+```
+
+### Manifest File
+
+Each processing run creates a `manifest.json` that tracks:
+- Video information (name, path)
+- Processing timestamp
+- Configuration used (Whisper model, vision settings, etc.)
+- Output file locations
+
+Example manifest:
+```json
+{
+  "video": {
+    "name": "meeting.mkv",
+    "path": "/full/path/to/meeting.mkv"
+  },
+  "processed_at": "2024-10-19T14:30:22",
+  "configuration": {
+    "whisper": {"enabled": true, "model": "base"},
+    "analysis": {"method": "vision", "vision_model": "llava:13b", "vision_context": "code"}
+  }
+}
+```

 ### Caching Behavior

-The tool automatically caches intermediate results to speed up re-runs:
- **Whisper transcript**: Cached as `output/<video>.json`
- **Extracted frames**: Cached in `frames/<video>_*.jpg`
- **Analysis results**: Cached as `output/<video>_vision.json` or `output/<video>_ocr.json`
+The tool automatically reuses the most recent output directory for the same video:
+- **First run**: Creates new timestamped directory (e.g., `20241019_143022-meeting/`)
+- **Subsequent runs**: Reuses the same directory and cached results
+- **Cached items**: Whisper transcript, extracted frames, analysis results
+- **Force new run**: Use `--no-cache` to create a fresh directory

-Re-running with the same video will use cached results unless `--no-cache` is specified.
+This means you can instantly switch between OCR and vision analysis without re-extracting frames!

 ## Workflow for Meeting Analysis

@@ -310,6 +341,15 @@ Options:
 - **`--vision-context dashboard`**: Extracts metrics, trends, panel names
 - **`--vision-context console`**: Captures commands, output, error messages

+**Customizing Prompts:**
+Prompts are stored as editable text files in `meetus/prompts/`:
+- `meeting.txt` - General meeting analysis
+- `code.txt` - Code screenshot analysis
+- `dashboard.txt` - Dashboard/monitoring analysis
+- `console.txt` - Terminal/console analysis
+
+Just edit these files to customize how the vision model analyzes your frames!
+
 ### Scene Detection vs Interval
 - **Scene detection**: Better for presentations with distinct slides. More efficient.
 - **Interval extraction**: Better for continuous screen sharing (coding, browsing). More thorough.
@@ -384,16 +424,31 @@ sudo apt-get install tesseract-ocr  # Don't forget system package!

 ```
 meetus/
-├── meetus/                  # Main package
+├── meetus/                     # Main package
 │   ├── __init__.py
-│   ├── frame_extractor.py   # Video frame extraction
-│   ├── ocr_processor.py     # OCR processing
-│   └── transcript_merger.py # Transcript merging
-├── process_meeting.py       # Main CLI script
-├── requirements.txt         # Python dependencies
-└── README.md               # This file
+│   ├── workflow.py             # Processing orchestrator
+│   ├── output_manager.py       # Output directory & manifest management
+│   ├── cache_manager.py        # Caching logic
+│   ├── frame_extractor.py      # Video frame extraction
+│   ├── vision_processor.py     # Vision model analysis (Ollama/LLaVA)
+│   ├── ocr_processor.py        # OCR processing
+│   ├── transcript_merger.py    # Transcript merging
+│   └── prompts/                # Vision analysis prompts (editable!)
+│       ├── meeting.txt         # General meeting analysis
+│       ├── code.txt            # Code screenshot analysis
+│       ├── dashboard.txt       # Dashboard/monitoring analysis
+│       └── console.txt         # Terminal/console analysis
+├── process_meeting.py          # Main CLI script (thin wrapper)
+├── requirements.txt            # Python dependencies
+├── output/                     # Timestamped output directories
+│   ├── .gitkeep
+│   └── YYYYMMDD_HHMMSS-video/  # Auto-generated per video
+├── samples/                    # Sample videos (gitignored)
+└── README.md                   # This file
 ```

+The code is modular and easy to extend - each module has a single responsibility.
+
 ## License

 For personal use.