2025-12-31 12:45:28 -03:00
2025-12-29 14:40:06 -03:00
2025-12-29 14:40:06 -03:00
2025-12-29 14:40:06 -03:00
2025-12-29 23:44:30 -03:00
2026-01-22 18:45:46 -03:00
2026-01-22 16:22:15 -03:00
2026-01-22 16:22:15 -03:00
2026-01-22 12:55:50 -03:00
2026-01-22 06:02:01 -03:00
2025-12-29 14:40:06 -03:00

sysmonstm

A real-time distributed system monitoring platform that streams metrics from multiple machines to a central hub with a live web dashboard.

Overview

sysmonstm demonstrates production microservices patterns (gRPC streaming, FastAPI, event-driven architecture) while solving a real problem: monitoring development infrastructure across multiple machines.

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│    Collector    │     │    Collector    │     │    Collector    │
│   (Machine 1)   │     │   (Machine 2)   │     │   (Machine N)   │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │                       │                       │
         │              gRPC Streaming                   │
         └───────────────────────┼───────────────────────┘
                                 ▼
                    ┌────────────────────────┐
                    │      Aggregator        │
                    │  (gRPC Server + Redis  │
                    │   + TimescaleDB)       │
                    └────────────┬───────────┘
                                 │
              ┌──────────────────┼──────────────────┐
              │                  │                  │
              ▼                  ▼                  ▼
     ┌────────────────┐  ┌──────────────┐  ┌──────────────┐
     │    Gateway     │  │    Alerts    │  │  Event Stream│
     │ (FastAPI + WS) │  │   Service    │  │ (Redis PubSub│
     └────────┬───────┘  └──────────────┘  └──────────────┘
              │
              │ WebSocket
              ▼
     ┌────────────────┐
     │    Browser     │
     │   Dashboard    │
     └────────────────┘

Features

  • Real-time streaming: Collectors stream metrics via gRPC to central aggregator
  • Multi-machine support: Monitor any number of machines from a single dashboard
  • Live dashboard: WebSocket-powered updates with real-time graphs
  • Tiered storage: Redis for hot data, TimescaleDB for historical analysis
  • Threshold alerts: Configurable rules for CPU, memory, disk usage
  • Event-driven: Decoupled services via Redis Pub/Sub

Quick Start

# Start the full stack
docker compose up

# Open dashboard
open http://localhost:8000

Metrics appear within seconds. The collector runs locally by default.

Monitor Additional Machines

Run the collector on any machine you want to monitor:

# On a remote machine, point to your aggregator
COLLECTOR_AGGREGATOR_URL=your-server:50051 \
COLLECTOR_MACHINE_ID=my-laptop \
python services/collector/main.py

Architecture

Services

Service Port Description
Collector - gRPC client that streams system metrics (CPU, memory, disk, network)
Aggregator 50051 gRPC server that receives metrics, stores them, publishes events
Gateway 8000 FastAPI server with REST API and WebSocket for dashboard
Alerts - Subscribes to events, evaluates threshold rules, triggers notifications

Infrastructure

Component Purpose
Redis Current state cache, event pub/sub
TimescaleDB Historical metrics with automatic downsampling

Key Patterns

  • gRPC Streaming: Collectors stream metrics continuously to the aggregator
  • Event-Driven: Services communicate via Redis Pub/Sub for decoupling
  • Tiered Storage: Hot data in Redis, historical in TimescaleDB
  • Graceful Degradation: System continues partially if storage fails

Project Structure

sysmonstm/
├── proto/
│   └── metrics.proto        # gRPC service definitions
├── services/
│   ├── collector/           # Metrics collection (psutil)
│   ├── aggregator/          # Central gRPC server
│   ├── gateway/             # FastAPI + WebSocket
│   └── alerts/              # Threshold evaluation
├── shared/
│   ├── config.py            # Pydantic settings
│   ├── logging.py           # Structured JSON logging
│   └── events/              # Event pub/sub abstraction
├── web/
│   ├── static/              # CSS, JS
│   └── templates/           # Dashboard HTML
├── scripts/
│   └── init-db.sql          # TimescaleDB schema
├── docs/                    # Architecture diagrams & explainers
├── docker-compose.yml
└── Tiltfile                 # Local Kubernetes dev

Configuration

All services use environment variables with sensible defaults:

# Collector
COLLECTOR_MACHINE_ID=my-machine      # Machine identifier
COLLECTOR_AGGREGATOR_URL=localhost:50051
COLLECTOR_COLLECTION_INTERVAL=5      # Seconds between collections

# Common
REDIS_URL=redis://localhost:6379
TIMESCALE_URL=postgresql://monitor:monitor@localhost:5432/monitor
LOG_LEVEL=INFO
LOG_FORMAT=json

Metrics Collected

  • CPU: Overall percentage, per-core usage
  • Memory: Percentage, used/available bytes
  • Disk: Percentage, used bytes, read/write throughput
  • Network: Bytes sent/received per second, connection count
  • System: Process count, load averages (1m, 5m, 15m)

Development

Local Development with Hot Reload

# Use the override file for volume mounts
docker compose -f docker-compose.yml -f docker-compose.override.yml up

Kubernetes Development with Tilt

tilt up

Running Services Individually

# Install dependencies
python -m venv .venv
source .venv/bin/activate
pip install -r services/collector/requirements.txt

# Generate protobuf code
python -m grpc_tools.protoc -I proto --python_out=. --grpc_python_out=. proto/metrics.proto

# Run a service
python services/collector/main.py

API Endpoints

REST (Gateway)

Endpoint Description
GET / Dashboard UI
GET /api/machines List all monitored machines
GET /api/machines/{id}/metrics Current metrics for a machine
GET /api/machines/{id}/history Historical metrics
GET /health Health check
GET /ready Readiness check (includes dependencies)

WebSocket

Connect to ws://localhost:8000/ws for real-time metric updates.

Documentation

Detailed documentation is available in the docs/ folder:

Tech Stack

  • Python 3.11+ with async/await throughout
  • gRPC for inter-service communication
  • FastAPI for REST API and WebSocket
  • Redis for caching and pub/sub
  • TimescaleDB for time-series storage
  • psutil for system metrics collection
  • Docker Compose for orchestration

License

MIT

Description
No description provided
Readme 398 KiB
Languages
Python 61%
HTML 23.7%
HCL 7.5%
Dockerfile 3.8%
Starlark 2%
Other 2%