futher improvements for the optimized version

2026-05-13 17:29:28 -03:00
1 changed files with 98 additions and 0 deletions
--- a/docs/lambdas-md/lambda-20-sign-pdfs-optimized.md
+++ b/docs/lambdas-md/lambda-20-sign-pdfs-optimized.md
@@ -235,3 +235,101 @@ def handler(event, context):
 | 6 | Sync file handle as `Body` | Flat memory for any manifest size |
 | 7 | `PageSize=1000` | Fewer S3 round-trips on large prefixes |
 | 8 | N concurrent consumers via `asyncio.gather` | Presign throughput scales with `concurrency` |
 ---
 ## Further improvements (not yet applied)
 These are the next natural steps if this function were going to production. They were left out intentionally — each adds infrastructure or AWS-side configuration that goes beyond the handler itself.
 ### Idempotency
 The function is not idempotent in the strict sense. Each invocation with the same `(bucket, prefix)` event produces a new manifest at a new UUID key. If Lambda retries the invocation (async invocations retry up to 2 times by default, and S3/SNS/EventBridge are at-least-once), you accumulate duplicate manifests in the `manifests/` prefix.
 The standard fix is a DynamoDB dedup table:
 ```python
 import hashlib, boto3 as _boto3
 _ddb = _boto3.resource("dynamodb")
 _table = _ddb.Table(os.environ["DEDUP_TABLE"])
 def _dedup_key(cfg: dict) -> str:
    raw = f"{cfg['bucket']}#{cfg['prefix']}"
    return hashlib.sha256(raw.encode()).hexdigest()[:32]
 # at the top of _run(), before any S3 work:
 dedup_key = _dedup_key(cfg)
 resp = _table.get_item(Key={"id": dedup_key})
 if "Item" in resp:
    return json.loads(resp["Item"]["result"])  # cached — skip all S3 work
 # ... do the work ...
 # at the end, before returning:
 _table.put_item(Item={
    "id": dedup_key,
    "result": json.dumps(result),
    "ttl": int(time.time()) + 86400,
 })
 ```
 The dedup key is derived from the logical job identity (`bucket + prefix`), not the `request_id`. Using `request_id` would only guard against Lambda's own retries of the same invocation; using the business key guards against a caller submitting the same job twice.
 AWS PowerTools for Lambda has a built-in `@idempotent` decorator that implements this exact pattern, including TTL management and in-progress locking.
 **What it requires:** a DynamoDB table, `dynamodb:GetItem` + `dynamodb:PutItem` IAM permissions on the execution role, and the PowerTools layer (or `aws-lambda-powertools` in requirements).
 ### Manifest lifecycle rule
 Every invocation writes a new object under `manifests/`. Without cleanup, this prefix grows unbounded. The fix is an S3 lifecycle rule on the bucket — not a handler change:
 ```json
 {
  "Rules": [{
    "ID": "expire-manifests",
    "Filter": { "Prefix": "manifests/" },
    "Status": "Enabled",
    "Expiration": { "Days": 1 }
  }]
 }
 ```
 Objects under `manifests/` are deleted by S3 automatically after 1 day. The presigned URLs in those manifests are already short-lived (15 minutes by default), so there's no reason to keep the manifest longer than the URL validity window.
 **What it requires:** a `PutBucketLifecycleConfiguration` call during infrastructure provisioning (CDK/Terraform/console) — nothing in the handler.
 ### Idempotency + manifest lifecycle together
 With both in place: a retry returns the same `manifest_url` pointing to the same (still-live) manifest object; after 24 hours the manifest is gone and the dedup record has expired, so the next invocation starts fresh. The combination is clean.
 ### ReportBatchItemFailures (SQS only)
 If this function were triggered by an SQS event source mapping (one message = one `(bucket, prefix)` job), the consumer-level `errors` field isn't enough — Lambda needs to know *which SQS messages* failed so it can re-queue only those. Return a `batchItemFailures` list instead of raising:
 ```python
 def handler(event, context):
    failures = []
    for record in event["Records"]:
        body = json.loads(record["body"])
        try:
            result = asyncio.run(_run(body, context.aws_request_id))
        except Exception as exc:
            failures.append({"itemIdentifier": record["messageId"]})
    return {"batchItemFailures": failures}
 ```
 Without this, a single failed message causes the entire batch to retry, including messages that succeeded — work is repeated and the queue can stall on a poison-pill message indefinitely.
 **What it requires:** `ReportBatchItemFailures` enabled on the ESM configuration (CDK/Terraform) and restructuring the handler to iterate over `event["Records"]`. Not applicable to direct (RequestResponse) invocations like the local tester uses.
 ### arm64 / Graviton2
 No code change needed. Switch the function's architecture to `arm64` in the deployment config:
 ```yaml
 # SAM
 Architectures: [arm64]
 ```
 Graviton2 costs ~20% less per GB-second and typically runs the init phase ~10% faster. The only blocker is native-code wheels: `aiobotocore` ships pure Python so there's no binary incompatibility here. Worth doing as a zero-effort cost and latency win.