Files
sysmonstm/CLAUDE.md
2026-01-22 12:55:50 -03:00

5.1 KiB

Distributed System Monitoring Platform

Project Overview

A real-time system monitoring platform that streams metrics from multiple machines to a central hub with live web dashboard. Built to demonstrate production microservices patterns (gRPC, FastAPI, streaming, event-driven architecture) while solving a real problem: monitoring development infrastructure across multiple machines.

Primary Goal: Portfolio project demonstrating real-time streaming architecture
Secondary Goal: Actually useful tool for monitoring multi-machine development environment
Status: Working MVP, deployed at sysmonstm.mcrn.ar

Deployment Modes

Production (3-tier)

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Collector  │────▶│     Hub     │────▶│    Edge     │
│ (each host) │     │  (local)    │     │   (AWS)     │
└─────────────┘     └─────────────┘     └─────────────┘
  • Collector (ctrl/collector/) - Lightweight agent on each monitored machine
  • Hub (ctrl/hub/) - Local aggregator, receives from collectors, forwards to edge
  • Edge (ctrl/edge/) - Cloud dashboard, public-facing

Development (Full Stack)

docker compose up  # Uses ctrl/dev/docker-compose.yml
  • Full gRPC-based microservices architecture
  • Services: aggregator, gateway, collector, alerts
  • Storage: Redis (hot), TimescaleDB (historical)

Directory Structure

sms/
├── services/                   # gRPC-based microservices (dev stack)
│   ├── collector/              # gRPC client, streams to aggregator
│   ├── aggregator/             # gRPC server, stores in Redis/TimescaleDB
│   ├── gateway/                # FastAPI, bridges gRPC to WebSocket
│   └── alerts/                 # Event subscriber for threshold alerts
│
├── ctrl/                       # Deployment configurations
│   ├── collector/              # Lightweight WebSocket collector
│   ├── hub/                    # Local aggregator
│   ├── edge/                   # Cloud dashboard
│   └── dev/                    # Full stack docker-compose
│
├── proto/                      # Protocol Buffer definitions
├── shared/                     # Shared Python modules
├── web/                        # Dashboard templates and static files
├── infra/                      # Terraform for AWS deployment
└── k8s/                        # Kubernetes manifests

Current Setup

Machines being monitored:

  • mcrn - Primary workstation (runs hub + collector)
  • nfrt - Secondary machine (runs collector only)

Topology:

mcrn                          nfrt                    AWS
├── hub ◄─────────────────── collector              edge (sysmonstm.mcrn.ar)
│    │                                                ▲
│    └────────────────────────────────────────────────┘
└── collector

Technical Stack

Core Technologies

  • Python 3.11+ - Primary language
  • FastAPI - Web gateway, REST endpoints, WebSocket streaming
  • gRPC - Inter-service communication (dev stack)
  • WebSockets - Production deployment communication
  • psutil - System metrics collection

Storage (Dev Stack Only)

  • PostgreSQL/TimescaleDB - Time-series historical data
  • Redis - Current state, caching, event pub/sub

Infrastructure

  • Docker Compose - Orchestration
  • Woodpecker CI - Build pipeline at ppl/pipelines/sysmonstm/
  • Registry - registry.mcrn.ar/sysmonstm/

Images

Image Purpose
collector Lightweight WebSocket collector for production
hub Local aggregator for production
edge Cloud dashboard for production
aggregator gRPC aggregator (dev stack)
gateway FastAPI gateway (dev stack)
collector-grpc gRPC collector (dev stack)
alerts Alert service (dev stack)

Development Guidelines

Code Quality

  • Type hints throughout (Python 3.11+ syntax)
  • Async/await patterns consistently
  • Logging (not print statements)
  • Error handling at boundaries

Docker

  • Multi-stage builds for smaller images
  • --network host for collectors (accurate network metrics)

Configuration

  • Environment variables for all config
  • Sensible defaults
  • No secrets in code

Interview/Portfolio Talking Points

Architecture Decisions

  • "3-tier for production: collector → hub → edge"
  • "Hub allows local aggregation and buffering before forwarding to cloud"
  • "Edge terminology shows awareness of edge computing patterns"
  • "Full gRPC stack for development demonstrates microservices patterns"

Trade-offs

  • Production vs Dev: simplicity/cost vs full architecture demo
  • WebSocket vs gRPC: browser compatibility vs efficiency
  • In-memory vs persistent: operational simplicity vs durability