diff --git a/docs/lambdas-md/lambda-20-sign-pdfs-optimized.md b/docs/lambdas-md/lambda-20-sign-pdfs-optimized.md index bd78712..4e2f7c4 100644 --- a/docs/lambdas-md/lambda-20-sign-pdfs-optimized.md +++ b/docs/lambdas-md/lambda-20-sign-pdfs-optimized.md @@ -235,3 +235,101 @@ def handler(event, context): | 6 | Sync file handle as `Body` | Flat memory for any manifest size | | 7 | `PageSize=1000` | Fewer S3 round-trips on large prefixes | | 8 | N concurrent consumers via `asyncio.gather` | Presign throughput scales with `concurrency` | + +--- + +## Further improvements (not yet applied) + +These are the next natural steps if this function were going to production. They were left out intentionally — each adds infrastructure or AWS-side configuration that goes beyond the handler itself. + +### Idempotency + +The function is not idempotent in the strict sense. Each invocation with the same `(bucket, prefix)` event produces a new manifest at a new UUID key. If Lambda retries the invocation (async invocations retry up to 2 times by default, and S3/SNS/EventBridge are at-least-once), you accumulate duplicate manifests in the `manifests/` prefix. + +The standard fix is a DynamoDB dedup table: + +```python +import hashlib, boto3 as _boto3 + +_ddb = _boto3.resource("dynamodb") +_table = _ddb.Table(os.environ["DEDUP_TABLE"]) + +def _dedup_key(cfg: dict) -> str: + raw = f"{cfg['bucket']}#{cfg['prefix']}" + return hashlib.sha256(raw.encode()).hexdigest()[:32] + +# at the top of _run(), before any S3 work: +dedup_key = _dedup_key(cfg) +resp = _table.get_item(Key={"id": dedup_key}) +if "Item" in resp: + return json.loads(resp["Item"]["result"]) # cached — skip all S3 work + +# ... do the work ... + +# at the end, before returning: +_table.put_item(Item={ + "id": dedup_key, + "result": json.dumps(result), + "ttl": int(time.time()) + 86400, +}) +``` + +The dedup key is derived from the logical job identity (`bucket + prefix`), not the `request_id`. Using `request_id` would only guard against Lambda's own retries of the same invocation; using the business key guards against a caller submitting the same job twice. + +AWS PowerTools for Lambda has a built-in `@idempotent` decorator that implements this exact pattern, including TTL management and in-progress locking. + +**What it requires:** a DynamoDB table, `dynamodb:GetItem` + `dynamodb:PutItem` IAM permissions on the execution role, and the PowerTools layer (or `aws-lambda-powertools` in requirements). + +### Manifest lifecycle rule + +Every invocation writes a new object under `manifests/`. Without cleanup, this prefix grows unbounded. The fix is an S3 lifecycle rule on the bucket — not a handler change: + +```json +{ + "Rules": [{ + "ID": "expire-manifests", + "Filter": { "Prefix": "manifests/" }, + "Status": "Enabled", + "Expiration": { "Days": 1 } + }] +} +``` + +Objects under `manifests/` are deleted by S3 automatically after 1 day. The presigned URLs in those manifests are already short-lived (15 minutes by default), so there's no reason to keep the manifest longer than the URL validity window. + +**What it requires:** a `PutBucketLifecycleConfiguration` call during infrastructure provisioning (CDK/Terraform/console) — nothing in the handler. + +### Idempotency + manifest lifecycle together + +With both in place: a retry returns the same `manifest_url` pointing to the same (still-live) manifest object; after 24 hours the manifest is gone and the dedup record has expired, so the next invocation starts fresh. The combination is clean. + +### ReportBatchItemFailures (SQS only) + +If this function were triggered by an SQS event source mapping (one message = one `(bucket, prefix)` job), the consumer-level `errors` field isn't enough — Lambda needs to know *which SQS messages* failed so it can re-queue only those. Return a `batchItemFailures` list instead of raising: + +```python +def handler(event, context): + failures = [] + for record in event["Records"]: + body = json.loads(record["body"]) + try: + result = asyncio.run(_run(body, context.aws_request_id)) + except Exception as exc: + failures.append({"itemIdentifier": record["messageId"]}) + return {"batchItemFailures": failures} +``` + +Without this, a single failed message causes the entire batch to retry, including messages that succeeded — work is repeated and the queue can stall on a poison-pill message indefinitely. + +**What it requires:** `ReportBatchItemFailures` enabled on the ESM configuration (CDK/Terraform) and restructuring the handler to iterate over `event["Records"]`. Not applicable to direct (RequestResponse) invocations like the local tester uses. + +### arm64 / Graviton2 + +No code change needed. Switch the function's architecture to `arm64` in the deployment config: + +```yaml +# SAM +Architectures: [arm64] +``` + +Graviton2 costs ~20% less per GB-second and typically runs the init phase ~10% faster. The only blocker is native-code wheels: `aiobotocore` ships pure Python so there's no binary incompatibility here. Worth doing as a zero-effort cost and latency win.