# Distributed System Monitoring Platform ## Project Overview A real-time system monitoring platform that streams metrics from multiple machines to a central hub with live web dashboard. Built to demonstrate production microservices patterns (gRPC, FastAPI, streaming, event-driven architecture) while solving a real problem: monitoring development infrastructure across multiple machines. **Primary Goal:** Portfolio project demonstrating real-time streaming architecture **Secondary Goal:** Actually useful tool for monitoring multi-machine development environment **Status:** Working MVP, deployed at sysmonstm.mcrn.ar ## Deployment Modes ### Production (3-tier) ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Collector │────▶│ Hub │────▶│ Edge │ │ (each host) │ │ (local) │ │ (AWS) │ └─────────────┘ └─────────────┘ └─────────────┘ ``` - **Collector** (`ctrl/collector/`) - Lightweight agent on each monitored machine - **Hub** (`ctrl/hub/`) - Local aggregator, receives from collectors, forwards to edge - **Edge** (`ctrl/edge/`) - Cloud dashboard, public-facing ### Development (Full Stack) ```bash docker compose up # Uses ctrl/dev/docker-compose.yml ``` - Full gRPC-based microservices architecture - Services: aggregator, gateway, collector, alerts - Storage: Redis (hot), TimescaleDB (historical) ## Directory Structure ``` sms/ ├── services/ # gRPC-based microservices (dev stack) │ ├── collector/ # gRPC client, streams to aggregator │ ├── aggregator/ # gRPC server, stores in Redis/TimescaleDB │ ├── gateway/ # FastAPI, bridges gRPC to WebSocket │ └── alerts/ # Event subscriber for threshold alerts │ ├── ctrl/ # Deployment configurations │ ├── collector/ # Lightweight WebSocket collector │ ├── hub/ # Local aggregator │ ├── edge/ # Cloud dashboard │ └── dev/ # Full stack docker-compose │ ├── proto/ # Protocol Buffer definitions ├── shared/ # Shared Python modules ├── web/ # Dashboard templates and static files ├── infra/ # Terraform for AWS deployment └── k8s/ # Kubernetes manifests ``` ## Current Setup **Machines being monitored:** - `mcrn` - Primary workstation (runs hub + collector) - `nfrt` - Secondary machine (runs collector only) **Topology:** ``` mcrn nfrt AWS ├── hub ◄─────────────────── collector edge (sysmonstm.mcrn.ar) │ │ ▲ │ └────────────────────────────────────────────────┘ └── collector ``` ## Technical Stack ### Core Technologies - **Python 3.11+** - Primary language - **FastAPI** - Web gateway, REST endpoints, WebSocket streaming - **gRPC** - Inter-service communication (dev stack) - **WebSockets** - Production deployment communication - **psutil** - System metrics collection ### Storage (Dev Stack Only) - **PostgreSQL/TimescaleDB** - Time-series historical data - **Redis** - Current state, caching, event pub/sub ### Infrastructure - **Docker Compose** - Orchestration - **Woodpecker CI** - Build pipeline at ppl/pipelines/sysmonstm/ - **Registry** - registry.mcrn.ar/sysmonstm/ ## Images | Image | Purpose | |-------|---------| | `collector` | Lightweight WebSocket collector for production | | `hub` | Local aggregator for production | | `edge` | Cloud dashboard for production | | `aggregator` | gRPC aggregator (dev stack) | | `gateway` | FastAPI gateway (dev stack) | | `collector-grpc` | gRPC collector (dev stack) | | `alerts` | Alert service (dev stack) | ## Development Guidelines ### Code Quality - Type hints throughout (Python 3.11+ syntax) - Async/await patterns consistently - Logging (not print statements) - Error handling at boundaries ### Docker - Multi-stage builds for smaller images - `--network host` for collectors (accurate network metrics) ### Configuration - Environment variables for all config - Sensible defaults - No secrets in code ## Interview/Portfolio Talking Points ### Architecture Decisions - "3-tier for production: collector → hub → edge" - "Hub allows local aggregation and buffering before forwarding to cloud" - "Edge terminology shows awareness of edge computing patterns" - "Full gRPC stack for development demonstrates microservices patterns" ### Trade-offs - Production vs Dev: simplicity/cost vs full architecture demo - WebSocket vs gRPC: browser compatibility vs efficiency - In-memory vs persistent: operational simplicity vs durability