lambda_local_runner/docs/lambdas-md/lambda-05-concurrency.md

# Concurrency

> Account quota, reserved, provisioned. The "100 RPS × 200 ms" math.

## The fundamental model

Lambda concurrency = the number of execution environments processing requests at the same instant. Each environment handles exactly one invocation at a time. There is no thread pool, no event loop shared across invocations — if two requests arrive simultaneously, AWS spins up two separate environments.

The key formula: **concurrency ≈ RPS × average duration (in seconds)**. At 100 requests/s with a 200 ms average handler duration, you need 100 × 0.2 = **20 concurrent environments**. At 500 ms average, you need 50. At 2 s average, 200 — and so on. Latency optimisation directly reduces your concurrency footprint.

## Account concurrency pool

Every AWS account has a regional concurrency quota — default **1 000 concurrent executions** per region, shared across all functions. When the pool is full, new invocations get throttled (sync → HTTP 429 TooManyRequestsException; async → queued and retried). Raising the limit requires a Service Quotas increase request; AWS typically grants up to 10 000 with a business justification.

This is the single most common production surprise: one function spikes and starves all others in the same region. Reserved concurrency is the fix.

## Types of concurrency

| Type | What it does | Cost | Use for |
|------|--------------|------|---------|
| **Unreserved** | Draws from the shared regional pool on demand | Invocation + duration only | Most functions |
| **Reserved** | Carves a slice of the regional pool exclusively for this function; acts as both a floor and a ceiling | No extra charge | Protecting critical paths from noisy neighbours; throttling cost runaway |
| **Provisioned** | Pre-warms N environments; they stay initialised 24/7 | PC-hours + invocation | Latency-sensitive functions where cold starts are unacceptable |

## Reserved concurrency edge cases

- Setting reserved concurrency to **0** disables the function entirely — useful as a circuit breaker.
- Reserved concurrency counts against the account pool even when idle. If you set 500 reserved on a function, only 500 remain for all other functions (at default 1 000).
- Reserved concurrency does **not** pre-warm. You still cold-start; you just can't scale past the cap.

## Burst scaling

When traffic spikes from zero, Lambda can spin up environments quickly — but not infinitely fast. The burst limit (region-dependent, typically 500–3 000 immediate) is how many environments AWS will create right now. Beyond that, it adds **500 new environments per minute**. A spike from 0 to 5 000 concurrent requests takes several minutes to fully absorb. Provisioned Concurrency or pre-warming via a ping mechanism is the fix for sudden large spikes.

> **Interview answer template:** "Concurrency = RPS × duration. Default pool is 1 000/region. Reserved carves a slice and prevents both starvation and runaway. Provisioned pre-warms to eliminate cold starts, but you pay for idle capacity."