3.0 KiB
Concurrency
Account quota, reserved, provisioned. The "100 RPS × 200 ms" math.
The fundamental model
Lambda concurrency = the number of execution environments processing requests at the same instant. Each environment handles exactly one invocation at a time. There is no thread pool, no event loop shared across invocations — if two requests arrive simultaneously, AWS spins up two separate environments.
The key formula: concurrency ≈ RPS × average duration (in seconds). At 100 requests/s with a 200 ms average handler duration, you need 100 × 0.2 = 20 concurrent environments. At 500 ms average, you need 50. At 2 s average, 200 — and so on. Latency optimisation directly reduces your concurrency footprint.
Account concurrency pool
Every AWS account has a regional concurrency quota — default 1 000 concurrent executions per region, shared across all functions. When the pool is full, new invocations get throttled (sync → HTTP 429 TooManyRequestsException; async → queued and retried). Raising the limit requires a Service Quotas increase request; AWS typically grants up to 10 000 with a business justification.
This is the single most common production surprise: one function spikes and starves all others in the same region. Reserved concurrency is the fix.
Types of concurrency
| Type | What it does | Cost | Use for |
|---|---|---|---|
| Unreserved | Draws from the shared regional pool on demand | Invocation + duration only | Most functions |
| Reserved | Carves a slice of the regional pool exclusively for this function; acts as both a floor and a ceiling | No extra charge | Protecting critical paths from noisy neighbours; throttling cost runaway |
| Provisioned | Pre-warms N environments; they stay initialised 24/7 | PC-hours + invocation | Latency-sensitive functions where cold starts are unacceptable |
Reserved concurrency edge cases
- Setting reserved concurrency to 0 disables the function entirely — useful as a circuit breaker.
- Reserved concurrency counts against the account pool even when idle. If you set 500 reserved on a function, only 500 remain for all other functions (at default 1 000).
- Reserved concurrency does not pre-warm. You still cold-start; you just can't scale past the cap.
Burst scaling
When traffic spikes from zero, Lambda can spin up environments quickly — but not infinitely fast. The burst limit (region-dependent, typically 500–3 000 immediate) is how many environments AWS will create right now. Beyond that, it adds 500 new environments per minute. A spike from 0 to 5 000 concurrent requests takes several minutes to fully absorb. Provisioned Concurrency or pre-warming via a ping mechanism is the fix for sudden large spikes.
Interview answer template: "Concurrency = RPS × duration. Default pool is 1 000/region. Reserved carves a slice and prevents both starvation and runaway. Provisioned pre-warms to eliminate cold starts, but you pay for idle capacity."