STELLAR AIR

NOVA Operations Platform — Architecture

WALKTHROUGH

A guided tour of the platform — start here for a narrative entry point before diving into the diagrams.

The problem

Stellar Air's operations need two things from the same underlying data. Passenger-facing teams need clear notifications when a flight is disrupted. Ops teams need shift-handover briefs that categorise every open issue by urgency. Both views ride on the same feeds — flights, weather, crew, maintenance — but with different slices, tones, and audiences. This platform unifies them through a shared MCP tool infrastructure.

Architecture at a glance

Vue UI → Kong Konnect (optional gateway) → FastAPI → LangGraph agents → MCP clients → three domain-scoped MCP servers → live APIs (OpenMeteo, FAA) and scenario data. The System diagram shows the full picture.

Data layer

Domain models live in mcp_servers/data/models.py — Pydantic types with enums for flight status, delay causes, and crew roles. Four scenarios (normal_ops, weather_disruption_ord, maintenance_delay_sfo, crew_swap_ewr) are Python modules loaded lazily by mcp_servers/data/scenarios/manager.py; each is a complete, consistent dataset switchable from the UI at runtime. Weather comes live from OpenMeteo (mcp_servers/data/real/openmeteo.py) — real forecasts along calculated route waypoints. Airport status comes live from the FAA NASSTATUS feed (mcp_servers/data/real/faa.py). Neither live source requires an API key.

MCP servers

Three servers scoped by access domain. shared exposes the data both agents need — flight status/details, route weather, hub forecasts, airport status/congestion, maintenance flags, and a delay_explainer prompt template. ops adds crew duty, rebookings, a handover-brief prompt, and the handover narrative generator; only the Handover agent connects to it. passenger adds the notification generator and a passenger-notification prompt with selectable tone; only the FCE agent connects to it. Each server declares tools, resources, and prompts.

MCP client

agents/shared/mcp_client.py defines MCPMultiClient plus a per-agent profile that declares which servers to connect to. Calls are namespaced by server name — mcp.call_tool('shared', 'get_flight_status', …). Tool results, resource reads, and prompt gets share a common parser and a tool runner that wraps each call in a Langfuse span with timeout and error collection (agents/shared/parser.py, agents/shared/tool_runner.py).

Agents

The FCE agent (agents/fce.py) is a four-node LangGraph: triage → gather → synthesize → format. The gather node fires five MCP tool calls in parallel via asyncio.gather — route weather, airport status, airport congestion, flight details, and crew notes — each wrapped in asyncio.wait_for with a 15-second timeout. The synthesis node calls generate_notification; if any gather call failed, the prompt is told which sources are missing and omits them rather than hallucinating.

The Handover agent (agents/handover.py) scans every hub in parallel, scores each disruption with a weighted severity × time-sensitivity function (delay minutes, crew duty limits, passenger impact, connection risk), and categorises the results into IMMEDIATE / MONITOR / FYI.

API layer

FastAPI (api/main.py) runs agents asynchronously: POST to /agents/fce returns a run_id immediately and the client polls /agents/runs/{run_id}. An EventHub broadcasts lifecycle events over WebSocket — agent_start, node_enter/node_exit, tool_call_end/tool_call_error, agent_end — so the UI can render the agent's internals live. A background task prunes completed runs after one hour. Configuration is centralised in a Pydantic Settings class (api/config.py); HTTP errors surface as proper status codes, not as 200 responses with an error body.

Kong Konnect

Kong sits in front as an optional API gateway — rate limiting, request analytics, the path to authentication. The UI reads a gateway URL from local storage or VITE_KONG_PROXY_URL; when empty it falls back to direct FastAPI calls. Kong is additive, not required, so the app keeps working even if the gateway is offline.

Frontend

Vue 3 SPA built on the internal soleprint-ui framework. Four tabs: Operations (run agents, see results), Internals (live tool-call stream over WebSocket via useAgentEvents), Data (inspect and edit the active scenario), and Settings (LLM provider, gateway URL). The internals view is the most useful one for understanding what the agent does on each run.

Testing

69 tests with dual-mode transport (tests/base.py). Default mode runs against ASGI in-process — fast, no server needed. Set CONTRACT_TEST_MODE=live and CONTRACT_TEST_URL=… to run the same assertions over real HTTP against any deployed instance.

Deployment & CI

Woodpecker CI (.woodpecker/build.yml) builds the API and UI images on push to main and pushes them to a private registry. ctrl/deploy.sh has two modes — rsync (copy source, build on the server, for fast iteration) and edge (pull tagged images from the registry, for production). Production runs as docker-compose on EC2 (ctrl/edge/docker-compose.yml) behind nginx, optionally behind Kong. Langfuse runs in a separate Kind cluster and is shared across projects.

SYSTEM ARCHITECTURE

End-to-end view: Vue UI → Kong gateway (optional) → FastAPI → MCP servers → live and scenario data sources. Langfuse (separate shared cluster) traces every agent run and tool call.

System Architecture
Live API Scenario data MCP protocol

MCP SERVER TOPOLOGY

Three servers scoped by access domain. Each exposes tools, resources, and prompts. FCE connects to shared + passenger. Handover connects to shared + ops.

MCP Servers
Shared server Ops server Passenger server
── solid = tool calls ╌╌ dashed = resource reads ··· dotted = prompt gets

FCE AGENT — BEHIND EVERY DEPARTURE

Passenger notification agent. Triages flight status, gathers context from 5 parallel tool calls (including live weather and FAA data), synthesizes an empathetic notification.

FCE Agent

SHIFT HANDOVER AGENT

Ops briefing agent. Scans all hubs in parallel, scores issues by severity × time sensitivity, categorizes into IMMEDIATE / MONITOR / FYI, generates a structured brief.

Handover Agent

DATA FLOW — REAL vs MOCK

Weather and FAA airport status are live (no API key). Flight, crew, passenger, and maintenance data are scenario-based fixtures switchable from the UI.

Data Flow
Live data (no API key) Scenario data (switchable)

DEPLOYMENT

Kind cluster for dev (Tilt), docker-compose for EC2 production (nova-api + nova-ui on shared gateway network). Woodpecker CI builds images on push to main. EC2 nginx proxies stellarair.mcrn.ar → container; Kong Konnect available as optional governance layer.

Deployment

REPOSITORY STRUCTURE

Monorepo: MCP servers, agents, API, Vue UI (with shared component framework), and deployment configs.

stellar-ops/
├── mcp_servers/
│   ├── shared/              server.py · tools.py · resources.py · prompts.py
│   │   └── tools: get_route_weather · get_hub_forecasts · get_airport_status
│   │             get_flight_status · get_flight_details · get_irregular_ops
│   │             get_airport_congestion · get_maintenance_flags
│   ├── ops/                 server.py · tools.py · resources.py · prompts.py
│   │   └── tools: get_crew_notes · get_crew_duty_status · get_pending_rebookings
│   │             generate_narrative
│   ├── passenger/           server.py · tools.py · resources.py · prompts.py
│   │   └── tools: generate_notification
│   ├── shared_llm.py         multi-provider: Groq · Anthropic · Bedrock · OpenAI
│   └── data/
│       ├── models.py         FlightData · CrewMember · Passenger · MELItem · HubInfo
│       ├── real/             openmeteo.py · faa.py
│       └── scenarios/        normal_ops · weather_disruption_ordmaintenance_delay_sfo · crew_swap_ewr
├── agents/
│   ├── fce.py                FCE — "Behind Every Departure" (passenger notifications)
│   ├── handover.py           Shift Handover (ops brief: IMMEDIATE / MONITOR / FYI)
│   └── shared/
│       ├── mcp_client.py     MCPMultiClient + connect_servers context manager
│       ├── parser.py         parse_tool_result · parse_resource_result · parse_prompt_result
│       └── tool_runner.py    build_tool_caller — timeout · Langfuse span · error collection
├── api/
│   ├── main.py               FastAPI: agents, scenarios, WebSocket, /health, Langfuse traces
│   └── config.py             Pydantic Settings — centralized env var reads
├── ui/
│   ├── framework/            soleprint-ui (shared component library)
│   └── app/                  Vue 3 SPA — Operations · Internals · Data · Settings
│       └── src/config.ts     Kong proxy URL + API/WS base
├── ctrl/
│   ├── Dockerfile.api/ui     Container builds
│   ├── nginx.conf            UI nginx (proxies /agents /scenarios /config /health /ws)
│   ├── k8s/                  base/ + overlays/dev/ (Kustomize)
│   ├── Tiltfile              Dev environment (Kind cluster: unt)
│   ├── edge/                 Production docker-compose (nova-api + nova-ui on gateway net)
│   └── deploy.sh             rsync (bypass CI) · edge (pull registry images)
├── tests/                  69 tests: models · clients · MCP · scenarios · agents
│   └── base.py               dual-mode: inprocess (default) · live (CONTRACT_TEST_MODE=live)
├── .woodpecker/            CI pipeline — build API + UI, push to registry.mcrn.ar
├── docs/                   Architecture graphs (this page)
└── .mcp.json                 Claude Code integration — 3 servers

DESIGN NOTES

Rationale behind the non-obvious choices, and a roadmap of deferred improvements. Protocol references link to the MCP spec at modelcontextprotocol.io.

Concurrency model

Everything runs on one OS thread under asyncio — no GIL contention, no thread locks. Shared mutable state (runs: dict, event_hub._clients: set) is safe because mutations are atomic relative to the event loop scheduler, and disconnects happen between awaits so broadcast iteration is race-free. The FCE agent fires five asyncio.create_task calls then asyncio.gather — five MCP tool calls run concurrently but cooperatively. This only breaks once runs grows large enough to want sharding across processes, at which point the in-process guarantees evaporate and a Redis-backed store becomes necessary (see Roadmap).

Stateless API, stateful MCP subprocesses

Each agent run spawns three MCP server subprocesses over stdio. This is wasteful per-request (~500 ms cold-start) but has one decisive advantage: full isolation. No shared scenario state across runs, no mutex on the scenario manager, no "wait, whose data was this?". The path forward is Streamable HTTP transport with long-lived servers — same tool code, different transport — which is a config change rather than a rewrite.

Domain-scoped MCP servers

Three servers — shared, ops, passenger — not one with RBAC filtering. The passenger agent literally cannot call get_crew_duty_status because it never connects to the ops server; the capability isn't even discoverable. Security boundary by architecture, not by authorization. Filter bugs become security bugs; MCP is a capability protocol, so using its native scoping is cleaner than bolting auth on top. If ops tools ever move to a separate team or repo they just become a separately-deployed MCP server — agents update their profile, not their code.

Tools, Resources, and Prompts

All three MCP primitives are used. Tools are actions or queries with potential side effects: get_flight_status, generate_notification. Resources are read-only data with URIs: ops://hubs/{code}, ops://handover/latest — a dynamic resource (updated after each handover) is still a resource because reading it has no side effects. Prompts are server-versioned templates: delay_explainer(cause_code, audience), passenger-notification(tone). The split matters because it lets the server own prompt versioning — update the template on the server and every client picks it up without a redeploy.

Why MCP over function calling, LangChain, or direct APIs

MCP wins when there are multiple consumers of the same tools (here, both a LangGraph agent and Claude Code), when dynamic tool discovery matters, and when protocol-level contracts are worth having. Provider function calling (OpenAI, Anthropic) bakes tool definitions into prompts and locks to one vendor. LangChain tools couple to LangChain's abstractions. Direct API calls are the N×M integration problem. MCP doesn't replace function calling — the LLM still uses its native tool-calling mechanism — it standardises the execution layer underneath.

ApproachStrengthsWeaknesses
MCPStandard, discoverable, client-agnostic, composableExtra process, protocol overhead for simple cases
Function callingSimple, no extra infrastructureProvider-locked, no runtime discovery, definitions duplicated per call
LangChain toolsTight framework integrationCoupled to LangChain, not usable outside
Direct API callsNo abstraction overheadN×M integration problem, no standardisation

LLM provider abstraction

One generate(system_prompt, user_content) function in mcp_servers/shared_llm.py with four backends: Groq (default, free), Anthropic, Bedrock, and any OpenAI-compatible endpoint. Selection happens at runtime via LLM_PROVIDER. LangChain's provider abstraction is heavier than needed here — string in, string out is enough — and switching providers touches one env var rather than the agent code.

Every narrative tool also has a structured template fallback. Response format is identical: {"text": str, "provider": str}. The UI surfaces the provider as a badge, so it's always visible whether a response came from an LLM or the template — honest about what mode the system is in. Tests pass without any API key; the demo works without any API key.

Scenarios in memory, not a database

Scenarios are Python modules, versioned with git, loaded lazily by the scenario manager. They are deliberately designed datasets, not user-generated content — git is more valuable than CRUD for them, and switching scenarios is a config change rather than a data migration. The reload-on-subprocess-spawn pattern sidesteps the cache-invalidation problem entirely. This would break once scenarios became per-tenant or grew beyond ~50 MB — then it's a database.

Dual-mode tests

tests/base.py supports two transports with the same 69 assertions. Default (inprocess) uses httpx.AsyncClient over ASGI — no server needed. live mode runs real HTTP against any CONTRACT_TEST_URL, so the same tests validate a deployed instance. Contract tests are definitionally transport-agnostic; duplicating them into two files would be the bug factory every project eventually regrets.

Kong as additive

The app works with or without Kong. When VITE_KONG_PROXY_URL is empty the UI calls FastAPI directly; when set it routes through Kong Konnect for rate limiting, analytics, and the path to auth. Graceful degradation beats a broken demo — especially relevant when the gateway sits on a trial subscription with a finite lifetime.

Langfuse in a shared cluster

Langfuse runs in its own Kind cluster separate from the app cluster. The v3 stack needs ClickHouse, Redis, MinIO, and a worker — four extra pods that aren't project-specific. Putting it in a shared cluster means every project points LANGFUSE_HOST at the same instance: one dashboard, one set of keys, one upgrade path. That's how Langfuse belongs in production — shared infra, not per-service.

Timeouts, TTL cleanup, error handling

Every MCP tool call is wrapped in asyncio.wait_for with a 15-second timeout — long enough to catch real hangs without false positives from slow-but-alive APIs (OpenMeteo and FAA typically respond in under 2 s). On timeout the span is marked ERROR in Langfuse, the error is added to the run's error list, and the agent continues with partial data. The notification prompt is told which sources are missing and omits them rather than hallucinating.

The in-memory run store is pruned by a background task that removes completed or errored runs older than one hour. Errors surface with proper HTTP status codes — HTTPException(404, …) for missing resources, 400 for invalid requests — rather than 200 responses with an error body, so clients can distinguish failure without parsing the payload.

Roadmap

Items deferred intentionally — the system works without them, and each is a clean extension rather than a rewrite.

  • MCP over Streamable HTTP. Replace subprocess-per-run with long-lived server processes. Becomes worthwhile once cold-start latency matters in aggregate or once MCP needs to serve multiple API replicas.
  • Redis-backed run store and event bus. Enables multi-instance WebSocket broadcast and survives API restarts. Necessary as soon as the API scales past a single process.
  • Database-backed scenarios. Replace the in-memory modules with a datastore once scenarios need to be per-tenant or grow beyond what fits comfortably in git.
  • Circuit breakers on external APIs. Exponential backoff and breakers on FAA and OpenMeteo via tenacity. Worth doing once those APIs have their first real outage.
  • Kong Key Auth. Per-consumer access control and per-agent rate limits. Unlocks multi-tenant use and a formal API-key lifecycle.