133 lines
5.1 KiB
Markdown
133 lines
5.1 KiB
Markdown
# Distributed System Monitoring Platform
|
|
|
|
## Project Overview
|
|
|
|
A real-time system monitoring platform that streams metrics from multiple machines to a central hub with live web dashboard. Built to demonstrate production microservices patterns (gRPC, FastAPI, streaming, event-driven architecture) while solving a real problem: monitoring development infrastructure across multiple machines.
|
|
|
|
**Primary Goal:** Portfolio project demonstrating real-time streaming architecture
|
|
**Secondary Goal:** Actually useful tool for monitoring multi-machine development environment
|
|
**Status:** Working MVP, deployed at sysmonstm.mcrn.ar
|
|
|
|
## Deployment Modes
|
|
|
|
### Production (3-tier)
|
|
|
|
```
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ Collector │────▶│ Hub │────▶│ Edge │
|
|
│ (each host) │ │ (local) │ │ (AWS) │
|
|
└─────────────┘ └─────────────┘ └─────────────┘
|
|
```
|
|
|
|
- **Collector** (`ctrl/collector/`) - Lightweight agent on each monitored machine
|
|
- **Hub** (`ctrl/hub/`) - Local aggregator, receives from collectors, forwards to edge
|
|
- **Edge** (`ctrl/edge/`) - Cloud dashboard, public-facing
|
|
|
|
### Development (Full Stack)
|
|
|
|
```bash
|
|
docker compose up # Uses ctrl/dev/docker-compose.yml
|
|
```
|
|
|
|
- Full gRPC-based microservices architecture
|
|
- Services: aggregator, gateway, collector, alerts
|
|
- Storage: Redis (hot), TimescaleDB (historical)
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
sms/
|
|
├── services/ # gRPC-based microservices (dev stack)
|
|
│ ├── collector/ # gRPC client, streams to aggregator
|
|
│ ├── aggregator/ # gRPC server, stores in Redis/TimescaleDB
|
|
│ ├── gateway/ # FastAPI, bridges gRPC to WebSocket
|
|
│ └── alerts/ # Event subscriber for threshold alerts
|
|
│
|
|
├── ctrl/ # Deployment configurations
|
|
│ ├── collector/ # Lightweight WebSocket collector
|
|
│ ├── hub/ # Local aggregator
|
|
│ ├── edge/ # Cloud dashboard
|
|
│ └── dev/ # Full stack docker-compose
|
|
│
|
|
├── proto/ # Protocol Buffer definitions
|
|
├── shared/ # Shared Python modules
|
|
├── web/ # Dashboard templates and static files
|
|
├── infra/ # Terraform for AWS deployment
|
|
└── k8s/ # Kubernetes manifests
|
|
```
|
|
|
|
## Current Setup
|
|
|
|
**Machines being monitored:**
|
|
- `mcrn` - Primary workstation (runs hub + collector)
|
|
- `nfrt` - Secondary machine (runs collector only)
|
|
|
|
**Topology:**
|
|
```
|
|
mcrn nfrt AWS
|
|
├── hub ◄─────────────────── collector edge (sysmonstm.mcrn.ar)
|
|
│ │ ▲
|
|
│ └────────────────────────────────────────────────┘
|
|
└── collector
|
|
```
|
|
|
|
## Technical Stack
|
|
|
|
### Core Technologies
|
|
- **Python 3.11+** - Primary language
|
|
- **FastAPI** - Web gateway, REST endpoints, WebSocket streaming
|
|
- **gRPC** - Inter-service communication (dev stack)
|
|
- **WebSockets** - Production deployment communication
|
|
- **psutil** - System metrics collection
|
|
|
|
### Storage (Dev Stack Only)
|
|
- **PostgreSQL/TimescaleDB** - Time-series historical data
|
|
- **Redis** - Current state, caching, event pub/sub
|
|
|
|
### Infrastructure
|
|
- **Docker Compose** - Orchestration
|
|
- **Woodpecker CI** - Build pipeline at ppl/pipelines/sysmonstm/
|
|
- **Registry** - registry.mcrn.ar/sysmonstm/
|
|
|
|
## Images
|
|
|
|
| Image | Purpose |
|
|
|-------|---------|
|
|
| `collector` | Lightweight WebSocket collector for production |
|
|
| `hub` | Local aggregator for production |
|
|
| `edge` | Cloud dashboard for production |
|
|
| `aggregator` | gRPC aggregator (dev stack) |
|
|
| `gateway` | FastAPI gateway (dev stack) |
|
|
| `collector-grpc` | gRPC collector (dev stack) |
|
|
| `alerts` | Alert service (dev stack) |
|
|
|
|
## Development Guidelines
|
|
|
|
### Code Quality
|
|
- Type hints throughout (Python 3.11+ syntax)
|
|
- Async/await patterns consistently
|
|
- Logging (not print statements)
|
|
- Error handling at boundaries
|
|
|
|
### Docker
|
|
- Multi-stage builds for smaller images
|
|
- `--network host` for collectors (accurate network metrics)
|
|
|
|
### Configuration
|
|
- Environment variables for all config
|
|
- Sensible defaults
|
|
- No secrets in code
|
|
|
|
## Interview/Portfolio Talking Points
|
|
|
|
### Architecture Decisions
|
|
- "3-tier for production: collector → hub → edge"
|
|
- "Hub allows local aggregation and buffering before forwarding to cloud"
|
|
- "Edge terminology shows awareness of edge computing patterns"
|
|
- "Full gRPC stack for development demonstrates microservices patterns"
|
|
|
|
### Trade-offs
|
|
- Production vs Dev: simplicity/cost vs full architecture demo
|
|
- WebSocket vs gRPC: browser compatibility vs efficiency
|
|
- In-memory vs persistent: operational simplicity vs durability
|