Files
lambda_local_runner/docs/lambdas-md/lambda-05-concurrency.md
2026-05-11 20:13:11 -03:00

36 lines
3.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Concurrency
> Account quota, reserved, provisioned. The "100 RPS × 200 ms" math.
## The fundamental model
Lambda concurrency = the number of execution environments processing requests at the same instant. Each environment handles exactly one invocation at a time. There is no thread pool, no event loop shared across invocations — if two requests arrive simultaneously, AWS spins up two separate environments.
The key formula: **concurrency ≈ RPS × average duration (in seconds)**. At 100 requests/s with a 200 ms average handler duration, you need 100 × 0.2 = **20 concurrent environments**. At 500 ms average, you need 50. At 2 s average, 200 — and so on. Latency optimisation directly reduces your concurrency footprint.
## Account concurrency pool
Every AWS account has a regional concurrency quota — default **1 000 concurrent executions** per region, shared across all functions. When the pool is full, new invocations get throttled (sync → HTTP 429 TooManyRequestsException; async → queued and retried). Raising the limit requires a Service Quotas increase request; AWS typically grants up to 10 000 with a business justification.
This is the single most common production surprise: one function spikes and starves all others in the same region. Reserved concurrency is the fix.
## Types of concurrency
| Type | What it does | Cost | Use for |
|------|--------------|------|---------|
| **Unreserved** | Draws from the shared regional pool on demand | Invocation + duration only | Most functions |
| **Reserved** | Carves a slice of the regional pool exclusively for this function; acts as both a floor and a ceiling | No extra charge | Protecting critical paths from noisy neighbours; throttling cost runaway |
| **Provisioned** | Pre-warms N environments; they stay initialised 24/7 | PC-hours + invocation | Latency-sensitive functions where cold starts are unacceptable |
## Reserved concurrency edge cases
- Setting reserved concurrency to **0** disables the function entirely — useful as a circuit breaker.
- Reserved concurrency counts against the account pool even when idle. If you set 500 reserved on a function, only 500 remain for all other functions (at default 1 000).
- Reserved concurrency does **not** pre-warm. You still cold-start; you just can't scale past the cap.
## Burst scaling
When traffic spikes from zero, Lambda can spin up environments quickly — but not infinitely fast. The burst limit (region-dependent, typically 5003 000 immediate) is how many environments AWS will create right now. Beyond that, it adds **500 new environments per minute**. A spike from 0 to 5 000 concurrent requests takes several minutes to fully absorb. Provisioned Concurrency or pre-warming via a ping mechanism is the fix for sudden large spikes.
> **Interview answer template:** "Concurrency = RPS × duration. Default pool is 1 000/region. Reserved carves a slice and prevents both starvation and runaway. Provisioned pre-warms to eliminate cold starts, but you pay for idle capacity."