Files
sysmonstm/CLAUDE.md
2026-01-22 12:55:50 -03:00

133 lines
5.1 KiB
Markdown

# Distributed System Monitoring Platform
## Project Overview
A real-time system monitoring platform that streams metrics from multiple machines to a central hub with live web dashboard. Built to demonstrate production microservices patterns (gRPC, FastAPI, streaming, event-driven architecture) while solving a real problem: monitoring development infrastructure across multiple machines.
**Primary Goal:** Portfolio project demonstrating real-time streaming architecture
**Secondary Goal:** Actually useful tool for monitoring multi-machine development environment
**Status:** Working MVP, deployed at sysmonstm.mcrn.ar
## Deployment Modes
### Production (3-tier)
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Collector │────▶│ Hub │────▶│ Edge │
│ (each host) │ │ (local) │ │ (AWS) │
└─────────────┘ └─────────────┘ └─────────────┘
```
- **Collector** (`ctrl/collector/`) - Lightweight agent on each monitored machine
- **Hub** (`ctrl/hub/`) - Local aggregator, receives from collectors, forwards to edge
- **Edge** (`ctrl/edge/`) - Cloud dashboard, public-facing
### Development (Full Stack)
```bash
docker compose up # Uses ctrl/dev/docker-compose.yml
```
- Full gRPC-based microservices architecture
- Services: aggregator, gateway, collector, alerts
- Storage: Redis (hot), TimescaleDB (historical)
## Directory Structure
```
sms/
├── services/ # gRPC-based microservices (dev stack)
│ ├── collector/ # gRPC client, streams to aggregator
│ ├── aggregator/ # gRPC server, stores in Redis/TimescaleDB
│ ├── gateway/ # FastAPI, bridges gRPC to WebSocket
│ └── alerts/ # Event subscriber for threshold alerts
├── ctrl/ # Deployment configurations
│ ├── collector/ # Lightweight WebSocket collector
│ ├── hub/ # Local aggregator
│ ├── edge/ # Cloud dashboard
│ └── dev/ # Full stack docker-compose
├── proto/ # Protocol Buffer definitions
├── shared/ # Shared Python modules
├── web/ # Dashboard templates and static files
├── infra/ # Terraform for AWS deployment
└── k8s/ # Kubernetes manifests
```
## Current Setup
**Machines being monitored:**
- `mcrn` - Primary workstation (runs hub + collector)
- `nfrt` - Secondary machine (runs collector only)
**Topology:**
```
mcrn nfrt AWS
├── hub ◄─────────────────── collector edge (sysmonstm.mcrn.ar)
│ │ ▲
│ └────────────────────────────────────────────────┘
└── collector
```
## Technical Stack
### Core Technologies
- **Python 3.11+** - Primary language
- **FastAPI** - Web gateway, REST endpoints, WebSocket streaming
- **gRPC** - Inter-service communication (dev stack)
- **WebSockets** - Production deployment communication
- **psutil** - System metrics collection
### Storage (Dev Stack Only)
- **PostgreSQL/TimescaleDB** - Time-series historical data
- **Redis** - Current state, caching, event pub/sub
### Infrastructure
- **Docker Compose** - Orchestration
- **Woodpecker CI** - Build pipeline at ppl/pipelines/sysmonstm/
- **Registry** - registry.mcrn.ar/sysmonstm/
## Images
| Image | Purpose |
|-------|---------|
| `collector` | Lightweight WebSocket collector for production |
| `hub` | Local aggregator for production |
| `edge` | Cloud dashboard for production |
| `aggregator` | gRPC aggregator (dev stack) |
| `gateway` | FastAPI gateway (dev stack) |
| `collector-grpc` | gRPC collector (dev stack) |
| `alerts` | Alert service (dev stack) |
## Development Guidelines
### Code Quality
- Type hints throughout (Python 3.11+ syntax)
- Async/await patterns consistently
- Logging (not print statements)
- Error handling at boundaries
### Docker
- Multi-stage builds for smaller images
- `--network host` for collectors (accurate network metrics)
### Configuration
- Environment variables for all config
- Sensible defaults
- No secrets in code
## Interview/Portfolio Talking Points
### Architecture Decisions
- "3-tier for production: collector → hub → edge"
- "Hub allows local aggregation and buffering before forwarding to cloud"
- "Edge terminology shows awareness of edge computing patterns"
- "Full gRPC stack for development demonstrates microservices patterns"
### Trade-offs
- Production vs Dev: simplicity/cost vs full architecture demo
- WebSocket vs gRPC: browser compatibility vs efficiency
- In-memory vs persistent: operational simplicity vs durability