lambda_local_runner/docs/lambdas-md/lambda-03-limits.md

# Limits — Cheatsheet

> Every number worth memorising. The "why it matters" column is the part interviews actually probe.

## Per-function compute & storage

| Limit | Default | Max | Why it matters |
|-------|---------|-----|----------------|
| Memory | 128 MB | 10 240 MB | CPU scales linearly with memory. More memory ≠ just more headroom — at >1769 MB you get a full vCPU; at higher tiers, multiple. Often *cheaper* to bump memory because duration drops faster than cost rises. |
| Timeout | 3 s | 900 s (15 min) | 3 s default is too short for almost anything that talks to S3. Set explicitly; don't accept the default. API Gateway caps at 29 s no matter what your function says (see below). |
| Ephemeral storage (/tmp) | 512 MB | 10 240 MB | Persists across warm invocations on the same env, vanishes on cold start. Not shared between concurrent envs. Pay per-invocation for >512 MB. |
| Init phase | 10 s hard cap | 10 s hard cap | Module-level code (imports, client construction). Heavy ML model loads, custom JIT warmups — measure them or you'll trip this. |

## Payloads & responses

| Limit | Value | Why it matters |
|-------|-------|----------------|
| Sync invocation request | 6 MB | Hard cap on the event body for `RequestResponse` invocations. |
| Sync invocation response | 6 MB | Truncated silently above this — your handler "succeeds" but the caller gets a 413. `lambda_function.py` sidesteps this by returning a manifest URL instead of inlining all presigned URLs. |
| Async invocation event | 256 KB | For `Event` invocations and most event-source-mapped triggers (S3, EventBridge, SNS). |
| Response streaming | 20 MB (soft) / unlimited (with bandwidth cap) | Function URLs and Lambda Streaming response mode break the 6 MB cap by flushing chunks. Not all clients/SDKs support it. |
| Environment variables | 4 KB total | Per function, all keys+values combined. Big config → Parameter Store / Secrets Manager. |
| Event size (SQS, SNS, EventBridge) | 256 KB each | Producer-side limit. Larger payloads → store in S3, send a pointer. |

## Packaging

| Limit | Value | Why it matters |
|-------|-------|----------------|
| Zip upload (direct) | 50 MB | Above this you must upload via S3 first. |
| Zip unzipped (function + layers) | 250 MB | Total of `/var/task` + all layers extracted. `aioboto3`+deps is ~50 MB; you have headroom but not infinite. |
| Container image | 10 GB | Per image. Preferred when you'd otherwise blow the 250 MB zip ceiling — e.g. ML deps with native binaries. |
| Layers | 5 per function | Ordering matters: later layers overwrite earlier. Layers count toward the 250 MB unzipped cap. |

## Concurrency & scaling

| Limit | Default | Notes |
|-------|---------|-------|
| Account concurrent executions | 1 000 / region | Soft quota — request increase via Service Quotas. The single most common throttling cause in production. |
| Burst concurrency | 500–3 000 (region-dependent) | How many fresh environments AWS will spin up immediately at traffic spike. Beyond this, scale-up is +500 envs / min. |
| Reserved concurrency | 0 to account quota | Carves a slice of the account pool for a function. Setting it to 0 effectively disables the function. |
| Provisioned concurrency | 0 by default | Pre-warmed envs. Eliminates cold starts at the cost of paying for idle capacity. Bills as PC-seconds + invocation cost. |

## Time & rate limits at the edges

| Surface | Limit | Why it matters |
|---------|-------|----------------|
| API Gateway integration timeout | 29 s | Caps your effective Lambda timeout when fronted by API GW, regardless of what the Lambda timeout says. Function URLs allow up to 15 min. |
| Async invocation event age | 6 h | If retries don't succeed in this window, the event is dropped (or sent to DLQ / on-failure destination). |
| Async retry attempts | 2 (default) | Total of 3 attempts (initial + 2). Configurable down to 0. |
| SQS visibility timeout requirement | ≥ 6× function timeout | AWS recommendation. Otherwise messages reappear while still being processed. |

> **Memorisation hack.** Three numbers cover most interview questions: **15 minutes** (timeout), **10 GB** (memory and /tmp ceiling), **6 MB** (sync payload). Everything else is a footnote until you hit a specific design.