System Monitoring Platform

Documentation


System Overview

View Full
System Overview

High-level architecture showing all services, data stores, and communication patterns.

Key Components

  • Collector: Runs on each monitored machine, streams metrics via gRPC
  • Aggregator: Central gRPC server, receives streams, normalizes data
  • Gateway: FastAPI service, WebSocket for browser, REST for queries
  • Alerts: Subscribes to events, evaluates thresholds, triggers actions

Data Flow Pipeline

View Full
Data Flow

How metrics flow from collection through storage with different retention tiers.

Storage Tiers

Tier Resolution Retention Use Case
Hot (Redis) 5s 5 min Current state, live dashboard
Raw (TimescaleDB) 5s 24h Recent detailed analysis
1-min Aggregates 1m 7d Week view, trends
1-hour Aggregates 1h 90d Long-term analysis

Deployment Architecture

View Full
Deployment

Deployment options from local development to AWS production.

Environments

  • Local Dev: Kind + Tilt for K8s, or Docker Compose
  • Demo (EC2): Docker Compose on t2.small at sysmonstm.mcrn.ar
  • Lambda Pipeline: SQS-triggered aggregation for data processing experience

gRPC Service Definitions

View Full
gRPC Services

Protocol Buffer service and message definitions.

Services

  • MetricsService: Client-side streaming for metrics ingestion
  • ControlService: Bidirectional streaming for collector control
  • ConfigService: Server-side streaming for config updates

Interview Talking Points

Domain Mapping

  • Machine = Payment Processor
  • Metrics Stream = Transaction Stream
  • Thresholds = Fraud Detection
  • Aggregator = Payment Hub

gRPC Patterns

  • Client streaming (metrics)
  • Server streaming (config)
  • Bidirectional (control)
  • Health checking

Event-Driven

  • Redis Pub/Sub (current)
  • Abstraction for Kafka switch
  • Decoupled alert processing
  • Real-time WebSocket push

Resilience

  • Collectors are independent
  • Graceful degradation
  • Retry with backoff
  • Health checks everywhere

Technology Stack

Core

  • Python 3.11+
  • FastAPI
  • gRPC / protobuf
  • asyncio

Data

  • TimescaleDB
  • Redis
  • Redis Pub/Sub

Infrastructure

  • Docker
  • Kubernetes
  • Kind + Tilt
  • Terraform

CI/CD

  • Woodpecker CI
  • Kustomize
  • Container Registry