Compare commits
1 Commits
29b095d583
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 3629d1183b |
@@ -235,3 +235,101 @@ def handler(event, context):
|
|||||||
| 6 | Sync file handle as `Body` | Flat memory for any manifest size |
|
| 6 | Sync file handle as `Body` | Flat memory for any manifest size |
|
||||||
| 7 | `PageSize=1000` | Fewer S3 round-trips on large prefixes |
|
| 7 | `PageSize=1000` | Fewer S3 round-trips on large prefixes |
|
||||||
| 8 | N concurrent consumers via `asyncio.gather` | Presign throughput scales with `concurrency` |
|
| 8 | N concurrent consumers via `asyncio.gather` | Presign throughput scales with `concurrency` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Further improvements (not yet applied)
|
||||||
|
|
||||||
|
These are the next natural steps if this function were going to production. They were left out intentionally — each adds infrastructure or AWS-side configuration that goes beyond the handler itself.
|
||||||
|
|
||||||
|
### Idempotency
|
||||||
|
|
||||||
|
The function is not idempotent in the strict sense. Each invocation with the same `(bucket, prefix)` event produces a new manifest at a new UUID key. If Lambda retries the invocation (async invocations retry up to 2 times by default, and S3/SNS/EventBridge are at-least-once), you accumulate duplicate manifests in the `manifests/` prefix.
|
||||||
|
|
||||||
|
The standard fix is a DynamoDB dedup table:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import hashlib, boto3 as _boto3
|
||||||
|
|
||||||
|
_ddb = _boto3.resource("dynamodb")
|
||||||
|
_table = _ddb.Table(os.environ["DEDUP_TABLE"])
|
||||||
|
|
||||||
|
def _dedup_key(cfg: dict) -> str:
|
||||||
|
raw = f"{cfg['bucket']}#{cfg['prefix']}"
|
||||||
|
return hashlib.sha256(raw.encode()).hexdigest()[:32]
|
||||||
|
|
||||||
|
# at the top of _run(), before any S3 work:
|
||||||
|
dedup_key = _dedup_key(cfg)
|
||||||
|
resp = _table.get_item(Key={"id": dedup_key})
|
||||||
|
if "Item" in resp:
|
||||||
|
return json.loads(resp["Item"]["result"]) # cached — skip all S3 work
|
||||||
|
|
||||||
|
# ... do the work ...
|
||||||
|
|
||||||
|
# at the end, before returning:
|
||||||
|
_table.put_item(Item={
|
||||||
|
"id": dedup_key,
|
||||||
|
"result": json.dumps(result),
|
||||||
|
"ttl": int(time.time()) + 86400,
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
The dedup key is derived from the logical job identity (`bucket + prefix`), not the `request_id`. Using `request_id` would only guard against Lambda's own retries of the same invocation; using the business key guards against a caller submitting the same job twice.
|
||||||
|
|
||||||
|
AWS PowerTools for Lambda has a built-in `@idempotent` decorator that implements this exact pattern, including TTL management and in-progress locking.
|
||||||
|
|
||||||
|
**What it requires:** a DynamoDB table, `dynamodb:GetItem` + `dynamodb:PutItem` IAM permissions on the execution role, and the PowerTools layer (or `aws-lambda-powertools` in requirements).
|
||||||
|
|
||||||
|
### Manifest lifecycle rule
|
||||||
|
|
||||||
|
Every invocation writes a new object under `manifests/`. Without cleanup, this prefix grows unbounded. The fix is an S3 lifecycle rule on the bucket — not a handler change:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"Rules": [{
|
||||||
|
"ID": "expire-manifests",
|
||||||
|
"Filter": { "Prefix": "manifests/" },
|
||||||
|
"Status": "Enabled",
|
||||||
|
"Expiration": { "Days": 1 }
|
||||||
|
}]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Objects under `manifests/` are deleted by S3 automatically after 1 day. The presigned URLs in those manifests are already short-lived (15 minutes by default), so there's no reason to keep the manifest longer than the URL validity window.
|
||||||
|
|
||||||
|
**What it requires:** a `PutBucketLifecycleConfiguration` call during infrastructure provisioning (CDK/Terraform/console) — nothing in the handler.
|
||||||
|
|
||||||
|
### Idempotency + manifest lifecycle together
|
||||||
|
|
||||||
|
With both in place: a retry returns the same `manifest_url` pointing to the same (still-live) manifest object; after 24 hours the manifest is gone and the dedup record has expired, so the next invocation starts fresh. The combination is clean.
|
||||||
|
|
||||||
|
### ReportBatchItemFailures (SQS only)
|
||||||
|
|
||||||
|
If this function were triggered by an SQS event source mapping (one message = one `(bucket, prefix)` job), the consumer-level `errors` field isn't enough — Lambda needs to know *which SQS messages* failed so it can re-queue only those. Return a `batchItemFailures` list instead of raising:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def handler(event, context):
|
||||||
|
failures = []
|
||||||
|
for record in event["Records"]:
|
||||||
|
body = json.loads(record["body"])
|
||||||
|
try:
|
||||||
|
result = asyncio.run(_run(body, context.aws_request_id))
|
||||||
|
except Exception as exc:
|
||||||
|
failures.append({"itemIdentifier": record["messageId"]})
|
||||||
|
return {"batchItemFailures": failures}
|
||||||
|
```
|
||||||
|
|
||||||
|
Without this, a single failed message causes the entire batch to retry, including messages that succeeded — work is repeated and the queue can stall on a poison-pill message indefinitely.
|
||||||
|
|
||||||
|
**What it requires:** `ReportBatchItemFailures` enabled on the ESM configuration (CDK/Terraform) and restructuring the handler to iterate over `event["Records"]`. Not applicable to direct (RequestResponse) invocations like the local tester uses.
|
||||||
|
|
||||||
|
### arm64 / Graviton2
|
||||||
|
|
||||||
|
No code change needed. Switch the function's architecture to `arm64` in the deployment config:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# SAM
|
||||||
|
Architectures: [arm64]
|
||||||
|
```
|
||||||
|
|
||||||
|
Graviton2 costs ~20% less per GB-second and typically runs the init phase ~10% faster. The only blocker is native-code wheels: `aiobotocore` ships pure Python so there's no binary incompatibility here. Worth doing as a zero-effort cost and latency win.
|
||||||
|
|||||||
Reference in New Issue
Block a user