futher improvements for the optimized version

2026-05-13 17:29:28 -03:00
1 changed files with 98 additions and 0 deletions
--- a/docs/lambdas-md/lambda-20-sign-pdfs-optimized.md
+++ b/docs/lambdas-md/lambda-20-sign-pdfs-optimized.md
@@ -235,3 +235,101 @@ def handler(event, context):
 | 6 | Sync file handle as `Body` | Flat memory for any manifest size |
 | 7 | `PageSize=1000` | Fewer S3 round-trips on large prefixes |
 | 8 | N concurrent consumers via `asyncio.gather` | Presign throughput scales with `concurrency` |
+
+---
+
+## Further improvements (not yet applied)
+
+These are the next natural steps if this function were going to production. They were left out intentionally — each adds infrastructure or AWS-side configuration that goes beyond the handler itself.
+
+### Idempotency
+
+The function is not idempotent in the strict sense. Each invocation with the same `(bucket, prefix)` event produces a new manifest at a new UUID key. If Lambda retries the invocation (async invocations retry up to 2 times by default, and S3/SNS/EventBridge are at-least-once), you accumulate duplicate manifests in the `manifests/` prefix.
+
+The standard fix is a DynamoDB dedup table:
+
+```python
+import hashlib, boto3 as _boto3
+
+_ddb = _boto3.resource("dynamodb")
+_table = _ddb.Table(os.environ["DEDUP_TABLE"])
+
+def _dedup_key(cfg: dict) -> str:
+    raw = f"{cfg['bucket']}#{cfg['prefix']}"
+    return hashlib.sha256(raw.encode()).hexdigest()[:32]
+
+# at the top of _run(), before any S3 work:
+dedup_key = _dedup_key(cfg)
+resp = _table.get_item(Key={"id": dedup_key})
+if "Item" in resp:
+    return json.loads(resp["Item"]["result"])  # cached — skip all S3 work
+
+# ... do the work ...
+
+# at the end, before returning:
+_table.put_item(Item={
+    "id": dedup_key,
+    "result": json.dumps(result),
+    "ttl": int(time.time()) + 86400,
+})
+```
+
+The dedup key is derived from the logical job identity (`bucket + prefix`), not the `request_id`. Using `request_id` would only guard against Lambda's own retries of the same invocation; using the business key guards against a caller submitting the same job twice.
+
+AWS PowerTools for Lambda has a built-in `@idempotent` decorator that implements this exact pattern, including TTL management and in-progress locking.
+
+**What it requires:** a DynamoDB table, `dynamodb:GetItem` + `dynamodb:PutItem` IAM permissions on the execution role, and the PowerTools layer (or `aws-lambda-powertools` in requirements).
+
+### Manifest lifecycle rule
+
+Every invocation writes a new object under `manifests/`. Without cleanup, this prefix grows unbounded. The fix is an S3 lifecycle rule on the bucket — not a handler change:
+
+```json
+{
+  "Rules": [{
+    "ID": "expire-manifests",
+    "Filter": { "Prefix": "manifests/" },
+    "Status": "Enabled",
+    "Expiration": { "Days": 1 }
+  }]
+}
+```
+
+Objects under `manifests/` are deleted by S3 automatically after 1 day. The presigned URLs in those manifests are already short-lived (15 minutes by default), so there's no reason to keep the manifest longer than the URL validity window.
+
+**What it requires:** a `PutBucketLifecycleConfiguration` call during infrastructure provisioning (CDK/Terraform/console) — nothing in the handler.
+
+### Idempotency + manifest lifecycle together
+
+With both in place: a retry returns the same `manifest_url` pointing to the same (still-live) manifest object; after 24 hours the manifest is gone and the dedup record has expired, so the next invocation starts fresh. The combination is clean.
+
+### ReportBatchItemFailures (SQS only)
+
+If this function were triggered by an SQS event source mapping (one message = one `(bucket, prefix)` job), the consumer-level `errors` field isn't enough — Lambda needs to know *which SQS messages* failed so it can re-queue only those. Return a `batchItemFailures` list instead of raising:
+
+```python
+def handler(event, context):
+    failures = []
+    for record in event["Records"]:
+        body = json.loads(record["body"])
+        try:
+            result = asyncio.run(_run(body, context.aws_request_id))
+        except Exception as exc:
+            failures.append({"itemIdentifier": record["messageId"]})
+    return {"batchItemFailures": failures}
+```
+
+Without this, a single failed message causes the entire batch to retry, including messages that succeeded — work is repeated and the queue can stall on a poison-pill message indefinitely.
+
+**What it requires:** `ReportBatchItemFailures` enabled on the ESM configuration (CDK/Terraform) and restructuring the handler to iterate over `event["Records"]`. Not applicable to direct (RequestResponse) invocations like the local tester uses.
+
+### arm64 / Graviton2
+
+No code change needed. Switch the function's architecture to `arm64` in the deployment config:
+
+```yaml
+# SAM
+Architectures: [arm64]
+```
+
+Graviton2 costs ~20% less per GB-second and typically runs the init phase ~10% faster. The only blocker is native-code wheels: `aiobotocore` ships pure Python so there's no binary incompatibility here. Worth doing as a zero-effort cost and latency win.