Files
lambda_local_runner/docs/lambdas-md/lambda-11-async-errors.md
2026-05-11 20:13:11 -03:00

3.9 KiB
Raw Permalink Blame History

Async & Errors

Sync vs async invoke. Retries, DLQ, destinations, idempotency, partial-batch failures.

Sync vs async invocation

Synchronous (RequestResponse) Asynchronous (Event)
Caller blocks? Yes — waits for result No — gets 202 immediately
Response visible to caller? Yes No
Retries on error None (caller's responsibility) 2 retries = 3 total attempts
Retry backoff ~1 min then ~2 min
Event age limit 6 hours
Max event size 6 MB 256 KB

Async retry flow

When Lambda invokes asynchronously and the function throws an unhandled exception (or is throttled), Lambda retries automatically — twice, with exponential backoff starting at ~1 minute. If all three attempts fail, or if the event ages past 6 hours, Lambda sends the event to the configured failure destination or DLQ. If neither is configured, the event is silently dropped.

DLQ vs Destinations

These are two different mechanisms that overlap in purpose but have different capabilities:

Dead-Letter Queue (DLQ) Event Destinations
Introduced 2016 (legacy) 2019 (preferred)
Triggers on Failure only Success or failure (separate configs)
Payload The original event only Original event + result/error + metadata
Targets SQS or SNS SQS, SNS, Lambda, EventBridge

Use Destinations for new code. DLQ remains useful when the downstream consumer must be SQS and you don't need success notifications.

Idempotency

Because async invocations retry and most event sources are at-least-once, your handler will occasionally execute more than once for the same logical event. Design handlers to be idempotent — the same input produces the same outcome regardless of how many times it runs.

Standard pattern: use a unique key from the event (S3 ETag + key, SQS MessageId, EventBridge detail.id) as a deduplication key. On first execution, write the key + result to DynamoDB with a TTL. On retry, check DynamoDB first — if already processed, return the cached result without re-running the work.

# pseudo-code
dedup_key = event["Records"][0]["messageId"]
existing = table.get_item(Key={"id": dedup_key})
if existing.get("Item"):
    return existing["Item"]["result"]

result = do_the_work(event)
table.put_item(Item={"id": dedup_key, "result": result, "ttl": now + 86400})
return result

AWS PowerTools for Lambda (Python) has a built-in @idempotent decorator that implements this pattern with DynamoDB.

Partial batch failures (SQS / Kinesis / DynamoDB Streams)

When Lambda processes a batch of records and one record fails, the default behaviour differs by source:

  • SQS (default): if the handler raises an exception, the entire batch is retried. One bad message blocks all others and can cause infinite retry loops.
  • With ReportBatchItemFailures enabled: return a batchItemFailures list containing only the failed message IDs. Lambda re-queues only those; successful messages are deleted.
def handler(event, context):
    failures = []
    for record in event["Records"]:
        try:
            process(record)
        except Exception:
            failures.append({"itemIdentifier": record["messageId"]})
    return {"batchItemFailures": failures}

Enable ReportBatchItemFailures in the ESM configuration and always implement partial-batch failure reporting for SQS and Kinesis handlers. A single poison-pill record can otherwise block an entire shard or queue indefinitely.

⚠️ The idempotencypartial-batch intersection: with partial failures, successful records in the batch are deleted from SQS, but if your function crashes before returning the failure list, the entire batch including the successes gets retried. Idempotency guards must still cover every record, not just the ones in batchItemFailures.