simple is better
This commit is contained in:
155
CLAUDE.md
155
CLAUDE.md
@@ -2,131 +2,90 @@
|
||||
|
||||
## Project Overview
|
||||
|
||||
A real-time system monitoring platform that streams metrics from multiple machines to a central hub with live web dashboard. Built to demonstrate production microservices patterns (gRPC, FastAPI, streaming, event-driven architecture) while solving a real problem: monitoring development infrastructure across multiple machines.
|
||||
A real-time system monitoring platform that streams metrics from multiple machines to a central hub with live web dashboard. Built to demonstrate production microservices patterns (gRPC, FastAPI, streaming, event-driven architecture).
|
||||
|
||||
**Primary Goal:** Portfolio project demonstrating real-time streaming architecture
|
||||
**Secondary Goal:** Actually useful tool for monitoring multi-machine development environment
|
||||
**Status:** Working MVP, deployed at sysmonstm.mcrn.ar
|
||||
**Primary Goal:** Portfolio project demonstrating real-time streaming with gRPC
|
||||
**Status:** Working, deployed at sysmonstm.mcrn.ar
|
||||
|
||||
## Deployment Modes
|
||||
|
||||
### Production (3-tier)
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Collector │────▶│ Hub │────▶│ Edge │
|
||||
│ (each host) │ │ (local) │ │ (AWS) │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
┌─────────────┐ ┌─────────────────────────────────────┐ ┌─────────────┐
|
||||
│ Collector │────▶│ Aggregator + Gateway + Redis + TS │────▶│ Edge │────▶ Browser
|
||||
│ (mcrn) │gRPC │ (LOCAL) │ WS │ (AWS) │ WS
|
||||
└─────────────┘ └─────────────────────────────────────┘ └─────────────┘
|
||||
┌─────────────┐ │
|
||||
│ Collector │────────────────────┘
|
||||
│ (nfrt) │gRPC
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
- **Collector** (`ctrl/collector/`) - Lightweight agent on each monitored machine
|
||||
- **Hub** (`ctrl/hub/`) - Local aggregator, receives from collectors, forwards to edge
|
||||
- **Edge** (`ctrl/edge/`) - Cloud dashboard, public-facing
|
||||
|
||||
### Development (Full Stack)
|
||||
|
||||
```bash
|
||||
docker compose up # Uses ctrl/dev/docker-compose.yml
|
||||
```
|
||||
|
||||
- Full gRPC-based microservices architecture
|
||||
- Services: aggregator, gateway, collector, alerts
|
||||
- Storage: Redis (hot), TimescaleDB (historical)
|
||||
- **Collectors** (`services/collector/`) - gRPC clients on each monitored machine
|
||||
- **Aggregator** (`services/aggregator/`) - gRPC server, stores in Redis/TimescaleDB
|
||||
- **Gateway** (`services/gateway/`) - FastAPI, bridges gRPC to WebSocket, forwards to edge
|
||||
- **Edge** (`ctrl/edge/`) - Simple WebSocket relay for AWS, serves public dashboard
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
sms/
|
||||
├── services/ # gRPC-based microservices (dev stack)
|
||||
├── services/ # gRPC-based microservices
|
||||
│ ├── collector/ # gRPC client, streams to aggregator
|
||||
│ ├── aggregator/ # gRPC server, stores in Redis/TimescaleDB
|
||||
│ ├── gateway/ # FastAPI, bridges gRPC to WebSocket
|
||||
│ ├── gateway/ # FastAPI, WebSocket, forwards to edge
|
||||
│ └── alerts/ # Event subscriber for threshold alerts
|
||||
│
|
||||
├── ctrl/ # Deployment configurations
|
||||
│ ├── collector/ # Lightweight WebSocket collector
|
||||
│ ├── hub/ # Local aggregator
|
||||
│ ├── edge/ # Cloud dashboard
|
||||
│ └── dev/ # Full stack docker-compose
|
||||
│ ├── dev/ # Full stack docker-compose
|
||||
│ └── edge/ # Cloud dashboard (AWS)
|
||||
│
|
||||
├── proto/ # Protocol Buffer definitions
|
||||
├── shared/ # Shared Python modules
|
||||
├── web/ # Dashboard templates and static files
|
||||
├── infra/ # Terraform for AWS deployment
|
||||
└── k8s/ # Kubernetes manifests
|
||||
├── shared/ # Shared Python modules (config, logging, events)
|
||||
└── web/ # Dashboard templates and static files
|
||||
```
|
||||
|
||||
## Current Setup
|
||||
## Running
|
||||
|
||||
**Machines being monitored:**
|
||||
- `mcrn` - Primary workstation (runs hub + collector)
|
||||
- `nfrt` - Secondary machine (runs collector only)
|
||||
|
||||
**Topology:**
|
||||
### Local Development
|
||||
```bash
|
||||
docker compose up
|
||||
```
|
||||
mcrn nfrt AWS
|
||||
├── hub ◄─────────────────── collector edge (sysmonstm.mcrn.ar)
|
||||
│ │ ▲
|
||||
│ └────────────────────────────────────────────────┘
|
||||
└── collector
|
||||
|
||||
### With Edge Forwarding (to AWS)
|
||||
```bash
|
||||
EDGE_URL=wss://sysmonstm.mcrn.ar/ws docker compose up
|
||||
```
|
||||
|
||||
### Collector on Remote Machine
|
||||
```bash
|
||||
docker run -d --network host \
|
||||
-e AGGREGATOR_URL=<local-ip>:50051 \
|
||||
-e MACHINE_ID=$(hostname) \
|
||||
registry.mcrn.ar/sysmonstm/collector:latest
|
||||
```
|
||||
|
||||
## Technical Stack
|
||||
|
||||
### Core Technologies
|
||||
- **Python 3.11+** - Primary language
|
||||
- **FastAPI** - Web gateway, REST endpoints, WebSocket streaming
|
||||
- **gRPC** - Inter-service communication (dev stack)
|
||||
- **WebSockets** - Production deployment communication
|
||||
- **psutil** - System metrics collection
|
||||
- **Python 3.11+**
|
||||
- **gRPC** - Collector to aggregator communication (showcased)
|
||||
- **FastAPI** - Gateway REST/WebSocket
|
||||
- **Redis** - Pub/Sub events, current state cache
|
||||
- **TimescaleDB** - Historical metrics storage
|
||||
- **WebSocket** - Gateway to edge, edge to browser
|
||||
|
||||
### Storage (Dev Stack Only)
|
||||
- **PostgreSQL/TimescaleDB** - Time-series historical data
|
||||
- **Redis** - Current state, caching, event pub/sub
|
||||
## Key Files
|
||||
|
||||
### Infrastructure
|
||||
- **Docker Compose** - Orchestration
|
||||
- **Woodpecker CI** - Build pipeline at ppl/pipelines/sysmonstm/
|
||||
- **Registry** - registry.mcrn.ar/sysmonstm/
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `proto/metrics.proto` | gRPC service and message definitions |
|
||||
| `services/collector/main.py` | gRPC streaming client |
|
||||
| `services/aggregator/main.py` | gRPC server, metric processing |
|
||||
| `services/gateway/main.py` | WebSocket bridge, edge forwarding |
|
||||
| `ctrl/edge/edge.py` | Simple WebSocket relay for AWS |
|
||||
|
||||
## Images
|
||||
## Portfolio Talking Points
|
||||
|
||||
| Image | Purpose |
|
||||
|-------|---------|
|
||||
| `collector` | Lightweight WebSocket collector for production |
|
||||
| `hub` | Local aggregator for production |
|
||||
| `edge` | Cloud dashboard for production |
|
||||
| `aggregator` | gRPC aggregator (dev stack) |
|
||||
| `gateway` | FastAPI gateway (dev stack) |
|
||||
| `collector-grpc` | gRPC collector (dev stack) |
|
||||
| `alerts` | Alert service (dev stack) |
|
||||
|
||||
## Development Guidelines
|
||||
|
||||
### Code Quality
|
||||
- Type hints throughout (Python 3.11+ syntax)
|
||||
- Async/await patterns consistently
|
||||
- Logging (not print statements)
|
||||
- Error handling at boundaries
|
||||
|
||||
### Docker
|
||||
- Multi-stage builds for smaller images
|
||||
- `--network host` for collectors (accurate network metrics)
|
||||
|
||||
### Configuration
|
||||
- Environment variables for all config
|
||||
- Sensible defaults
|
||||
- No secrets in code
|
||||
|
||||
## Interview/Portfolio Talking Points
|
||||
|
||||
### Architecture Decisions
|
||||
- "3-tier for production: collector → hub → edge"
|
||||
- "Hub allows local aggregation and buffering before forwarding to cloud"
|
||||
- "Edge terminology shows awareness of edge computing patterns"
|
||||
- "Full gRPC stack for development demonstrates microservices patterns"
|
||||
|
||||
### Trade-offs
|
||||
- Production vs Dev: simplicity/cost vs full architecture demo
|
||||
- WebSocket vs gRPC: browser compatibility vs efficiency
|
||||
- In-memory vs persistent: operational simplicity vs durability
|
||||
- **gRPC streaming** - Efficient binary protocol for real-time metrics
|
||||
- **Event-driven** - Redis Pub/Sub decouples processing from delivery
|
||||
- **Edge pattern** - Heavy processing local, lightweight relay in cloud
|
||||
- **Cost optimization** - ~$10/mo for public dashboard (data transfer, not requests)
|
||||
|
||||
Reference in New Issue
Block a user