6dc3c016372353d73082025edb8d3733253d13fc
sysmonstm
A real-time distributed system monitoring platform that streams metrics from multiple machines to a central hub with a live web dashboard.
Overview
sysmonstm demonstrates production microservices patterns (gRPC streaming, FastAPI, event-driven architecture) while solving a real problem: monitoring development infrastructure across multiple machines.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Collector │ │ Collector │ │ Collector │
│ (Machine 1) │ │ (Machine 2) │ │ (Machine N) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
│ gRPC Streaming │
└───────────────────────┼───────────────────────┘
▼
┌────────────────────────┐
│ Aggregator │
│ (gRPC Server + Redis │
│ + TimescaleDB) │
└────────────┬───────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌────────────────┐ ┌──────────────┐ ┌──────────────┐
│ Gateway │ │ Alerts │ │ Event Stream│
│ (FastAPI + WS) │ │ Service │ │ (Redis PubSub│
└────────┬───────┘ └──────────────┘ └──────────────┘
│
│ WebSocket
▼
┌────────────────┐
│ Browser │
│ Dashboard │
└────────────────┘
Features
- Real-time streaming: Collectors stream metrics via gRPC to central aggregator
- Multi-machine support: Monitor any number of machines from a single dashboard
- Live dashboard: WebSocket-powered updates with real-time graphs
- Tiered storage: Redis for hot data, TimescaleDB for historical analysis
- Threshold alerts: Configurable rules for CPU, memory, disk usage
- Event-driven: Decoupled services via Redis Pub/Sub
Quick Start
# Start the full stack
docker compose up
# Open dashboard
open http://localhost:8000
Metrics appear within seconds. The collector runs locally by default.
Monitor Additional Machines
Run the collector on any machine you want to monitor:
# On a remote machine, point to your aggregator
COLLECTOR_AGGREGATOR_URL=your-server:50051 \
COLLECTOR_MACHINE_ID=my-laptop \
python services/collector/main.py
Architecture
Services
| Service | Port | Description |
|---|---|---|
| Collector | - | gRPC client that streams system metrics (CPU, memory, disk, network) |
| Aggregator | 50051 | gRPC server that receives metrics, stores them, publishes events |
| Gateway | 8000 | FastAPI server with REST API and WebSocket for dashboard |
| Alerts | - | Subscribes to events, evaluates threshold rules, triggers notifications |
Infrastructure
| Component | Purpose |
|---|---|
| Redis | Current state cache, event pub/sub |
| TimescaleDB | Historical metrics with automatic downsampling |
Key Patterns
- gRPC Streaming: Collectors stream metrics continuously to the aggregator
- Event-Driven: Services communicate via Redis Pub/Sub for decoupling
- Tiered Storage: Hot data in Redis, historical in TimescaleDB
- Graceful Degradation: System continues partially if storage fails
Project Structure
sysmonstm/
├── proto/
│ └── metrics.proto # gRPC service definitions
├── services/
│ ├── collector/ # Metrics collection (psutil)
│ ├── aggregator/ # Central gRPC server
│ ├── gateway/ # FastAPI + WebSocket
│ └── alerts/ # Threshold evaluation
├── shared/
│ ├── config.py # Pydantic settings
│ ├── logging.py # Structured JSON logging
│ └── events/ # Event pub/sub abstraction
├── web/
│ ├── static/ # CSS, JS
│ └── templates/ # Dashboard HTML
├── scripts/
│ └── init-db.sql # TimescaleDB schema
├── docs/ # Architecture diagrams & explainers
├── docker-compose.yml
└── Tiltfile # Local Kubernetes dev
Configuration
All services use environment variables with sensible defaults:
# Collector
COLLECTOR_MACHINE_ID=my-machine # Machine identifier
COLLECTOR_AGGREGATOR_URL=localhost:50051
COLLECTOR_COLLECTION_INTERVAL=5 # Seconds between collections
# Common
REDIS_URL=redis://localhost:6379
TIMESCALE_URL=postgresql://monitor:monitor@localhost:5432/monitor
LOG_LEVEL=INFO
LOG_FORMAT=json
Metrics Collected
- CPU: Overall percentage, per-core usage
- Memory: Percentage, used/available bytes
- Disk: Percentage, used bytes, read/write throughput
- Network: Bytes sent/received per second, connection count
- System: Process count, load averages (1m, 5m, 15m)
Development
Local Development with Hot Reload
# Use the override file for volume mounts
docker compose -f docker-compose.yml -f docker-compose.override.yml up
Kubernetes Development with Tilt
tilt up
Running Services Individually
# Install dependencies
python -m venv .venv
source .venv/bin/activate
pip install -r services/collector/requirements.txt
# Generate protobuf code
python -m grpc_tools.protoc -I proto --python_out=. --grpc_python_out=. proto/metrics.proto
# Run a service
python services/collector/main.py
API Endpoints
REST (Gateway)
| Endpoint | Description |
|---|---|
GET / |
Dashboard UI |
GET /api/machines |
List all monitored machines |
GET /api/machines/{id}/metrics |
Current metrics for a machine |
GET /api/machines/{id}/history |
Historical metrics |
GET /health |
Health check |
GET /ready |
Readiness check (includes dependencies) |
WebSocket
Connect to ws://localhost:8000/ws for real-time metric updates.
Documentation
Detailed documentation is available in the docs/ folder:
- Architecture Diagrams - System overview, data flow, deployment
- Building sysmonstm - Deep dive into implementation decisions
- Domain Applications - How these patterns apply to payment processing and other domains
Tech Stack
- Python 3.11+ with async/await throughout
- gRPC for inter-service communication
- FastAPI for REST API and WebSocket
- Redis for caching and pub/sub
- TimescaleDB for time-series storage
- psutil for system metrics collection
- Docker Compose for orchestration
License
MIT
Description
Languages
Python
61%
HTML
23.7%
HCL
7.5%
Dockerfile
3.8%
Starlark
2%
Other
2%