Concurrency

Account quota, reserved, provisioned. The "100 RPS × 200 ms" math.

The fundamental model

Lambda concurrency = the number of execution environments processing requests at the same instant. Each environment handles exactly one invocation at a time. There is no thread pool, no event loop shared across invocations — if two requests arrive simultaneously, AWS spins up two separate environments.

The key formula: concurrency ≈ RPS × average duration (in seconds). At 100 requests/s with a 200 ms average handler duration, you need 100 × 0.2 = 20 concurrent environments. At 500 ms average, you need 50. At 2 s average, 200 — and so on. Latency optimisation directly reduces your concurrency footprint.

Account concurrency pool

Every AWS account has a regional concurrency quota — default 1 000 concurrent executions per region, shared across all functions. When the pool is full, new invocations get throttled (sync → HTTP 429 TooManyRequestsException; async → queued and retried). Raising the limit requires a Service Quotas increase request; AWS typically grants up to 10 000 with a business justification.

This is the single most common production surprise: one function spikes and starves all others in the same region. Reserved concurrency is the fix.

Types of concurrency

Type	What it does	Cost	Use for
Unreserved	Draws from the shared regional pool on demand	Invocation + duration only	Most functions
Reserved	Carves a slice of the regional pool exclusively for this function; acts as both a floor and a ceiling	No extra charge	Protecting critical paths from noisy neighbours; throttling cost runaway
Provisioned	Pre-warms N environments; they stay initialised 24/7	PC-hours + invocation	Latency-sensitive functions where cold starts are unacceptable

Reserved concurrency edge cases

Setting reserved concurrency to 0 disables the function entirely — useful as a circuit breaker.
Reserved concurrency counts against the account pool even when idle. If you set 500 reserved on a function, only 500 remain for all other functions (at default 1 000).
Reserved concurrency does not pre-warm. You still cold-start; you just can't scale past the cap.

Burst scaling

When traffic spikes from zero, Lambda can spin up environments quickly — but not infinitely fast. The burst limit (region-dependent, typically 500–3 000 immediate) is how many environments AWS will create right now. Beyond that, it adds 500 new environments per minute. A spike from 0 to 5 000 concurrent requests takes several minutes to fully absorb. Provisioned Concurrency or pre-warming via a ping mechanism is the fix for sudden large spikes.

Interview answer template: "Concurrency = RPS × duration. Default pool is 1 000/region. Reserved carves a slice and prevents both starvation and runaway. Provisioned pre-warms to eliminate cold starts, but you pay for idle capacity."

3.0 KiB Raw Blame History Unescape Escape