refactor
This commit is contained in:
91
README.md
91
README.md
@@ -184,22 +184,53 @@ python process_meeting.py samples/meeting.mkv --run-whisper --use-vision --verbo
|
||||
|
||||
## Output Files
|
||||
|
||||
All output files are saved to the `output/` directory by default:
|
||||
Each video gets its own timestamped output directory:
|
||||
|
||||
- **`output/<video>_enhanced.txt`** - Enhanced transcript ready for AI summarization
|
||||
- **`output/<video>.json`** - Whisper transcript (if `--run-whisper` was used)
|
||||
- **`output/<video>_vision.json`** - Vision analysis results with timestamps (if `--use-vision`)
|
||||
- **`output/<video>_ocr.json`** - OCR results with timestamps (if using OCR)
|
||||
- **`frames/`** - Extracted video frames (JPG files)
|
||||
```
|
||||
output/
|
||||
└── 20241019_143022-meeting/
|
||||
├── manifest.json # Processing configuration
|
||||
├── meeting_enhanced.txt # Enhanced transcript for AI
|
||||
├── meeting.json # Whisper transcript
|
||||
├── meeting_vision.json # Vision analysis results
|
||||
└── frames/ # Extracted video frames
|
||||
├── frame_00001_5.00s.jpg
|
||||
├── frame_00002_10.00s.jpg
|
||||
└── ...
|
||||
```
|
||||
|
||||
### Manifest File
|
||||
|
||||
Each processing run creates a `manifest.json` that tracks:
|
||||
- Video information (name, path)
|
||||
- Processing timestamp
|
||||
- Configuration used (Whisper model, vision settings, etc.)
|
||||
- Output file locations
|
||||
|
||||
Example manifest:
|
||||
```json
|
||||
{
|
||||
"video": {
|
||||
"name": "meeting.mkv",
|
||||
"path": "/full/path/to/meeting.mkv"
|
||||
},
|
||||
"processed_at": "2024-10-19T14:30:22",
|
||||
"configuration": {
|
||||
"whisper": {"enabled": true, "model": "base"},
|
||||
"analysis": {"method": "vision", "vision_model": "llava:13b", "vision_context": "code"}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Caching Behavior
|
||||
|
||||
The tool automatically caches intermediate results to speed up re-runs:
|
||||
- **Whisper transcript**: Cached as `output/<video>.json`
|
||||
- **Extracted frames**: Cached in `frames/<video>_*.jpg`
|
||||
- **Analysis results**: Cached as `output/<video>_vision.json` or `output/<video>_ocr.json`
|
||||
The tool automatically reuses the most recent output directory for the same video:
|
||||
- **First run**: Creates new timestamped directory (e.g., `20241019_143022-meeting/`)
|
||||
- **Subsequent runs**: Reuses the same directory and cached results
|
||||
- **Cached items**: Whisper transcript, extracted frames, analysis results
|
||||
- **Force new run**: Use `--no-cache` to create a fresh directory
|
||||
|
||||
Re-running with the same video will use cached results unless `--no-cache` is specified.
|
||||
This means you can instantly switch between OCR and vision analysis without re-extracting frames!
|
||||
|
||||
## Workflow for Meeting Analysis
|
||||
|
||||
@@ -310,6 +341,15 @@ Options:
|
||||
- **`--vision-context dashboard`**: Extracts metrics, trends, panel names
|
||||
- **`--vision-context console`**: Captures commands, output, error messages
|
||||
|
||||
**Customizing Prompts:**
|
||||
Prompts are stored as editable text files in `meetus/prompts/`:
|
||||
- `meeting.txt` - General meeting analysis
|
||||
- `code.txt` - Code screenshot analysis
|
||||
- `dashboard.txt` - Dashboard/monitoring analysis
|
||||
- `console.txt` - Terminal/console analysis
|
||||
|
||||
Just edit these files to customize how the vision model analyzes your frames!
|
||||
|
||||
### Scene Detection vs Interval
|
||||
- **Scene detection**: Better for presentations with distinct slides. More efficient.
|
||||
- **Interval extraction**: Better for continuous screen sharing (coding, browsing). More thorough.
|
||||
@@ -384,16 +424,31 @@ sudo apt-get install tesseract-ocr # Don't forget system package!
|
||||
|
||||
```
|
||||
meetus/
|
||||
├── meetus/ # Main package
|
||||
├── meetus/ # Main package
|
||||
│ ├── __init__.py
|
||||
│ ├── frame_extractor.py # Video frame extraction
|
||||
│ ├── ocr_processor.py # OCR processing
|
||||
│ └── transcript_merger.py # Transcript merging
|
||||
├── process_meeting.py # Main CLI script
|
||||
├── requirements.txt # Python dependencies
|
||||
└── README.md # This file
|
||||
│ ├── workflow.py # Processing orchestrator
|
||||
│ ├── output_manager.py # Output directory & manifest management
|
||||
│ ├── cache_manager.py # Caching logic
|
||||
│ ├── frame_extractor.py # Video frame extraction
|
||||
│ ├── vision_processor.py # Vision model analysis (Ollama/LLaVA)
|
||||
│ ├── ocr_processor.py # OCR processing
|
||||
│ ├── transcript_merger.py # Transcript merging
|
||||
│ └── prompts/ # Vision analysis prompts (editable!)
|
||||
│ ├── meeting.txt # General meeting analysis
|
||||
│ ├── code.txt # Code screenshot analysis
|
||||
│ ├── dashboard.txt # Dashboard/monitoring analysis
|
||||
│ └── console.txt # Terminal/console analysis
|
||||
├── process_meeting.py # Main CLI script (thin wrapper)
|
||||
├── requirements.txt # Python dependencies
|
||||
├── output/ # Timestamped output directories
|
||||
│ ├── .gitkeep
|
||||
│ └── YYYYMMDD_HHMMSS-video/ # Auto-generated per video
|
||||
├── samples/ # Sample videos (gitignored)
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
The code is modular and easy to extend - each module has a single responsibility.
|
||||
|
||||
## License
|
||||
|
||||
For personal use.
|
||||
|
||||
Reference in New Issue
Block a user