Compare commits
1 Commits
29b095d583
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 3629d1183b |
@@ -235,3 +235,101 @@ def handler(event, context):
|
||||
| 6 | Sync file handle as `Body` | Flat memory for any manifest size |
|
||||
| 7 | `PageSize=1000` | Fewer S3 round-trips on large prefixes |
|
||||
| 8 | N concurrent consumers via `asyncio.gather` | Presign throughput scales with `concurrency` |
|
||||
|
||||
---
|
||||
|
||||
## Further improvements (not yet applied)
|
||||
|
||||
These are the next natural steps if this function were going to production. They were left out intentionally — each adds infrastructure or AWS-side configuration that goes beyond the handler itself.
|
||||
|
||||
### Idempotency
|
||||
|
||||
The function is not idempotent in the strict sense. Each invocation with the same `(bucket, prefix)` event produces a new manifest at a new UUID key. If Lambda retries the invocation (async invocations retry up to 2 times by default, and S3/SNS/EventBridge are at-least-once), you accumulate duplicate manifests in the `manifests/` prefix.
|
||||
|
||||
The standard fix is a DynamoDB dedup table:
|
||||
|
||||
```python
|
||||
import hashlib, boto3 as _boto3
|
||||
|
||||
_ddb = _boto3.resource("dynamodb")
|
||||
_table = _ddb.Table(os.environ["DEDUP_TABLE"])
|
||||
|
||||
def _dedup_key(cfg: dict) -> str:
|
||||
raw = f"{cfg['bucket']}#{cfg['prefix']}"
|
||||
return hashlib.sha256(raw.encode()).hexdigest()[:32]
|
||||
|
||||
# at the top of _run(), before any S3 work:
|
||||
dedup_key = _dedup_key(cfg)
|
||||
resp = _table.get_item(Key={"id": dedup_key})
|
||||
if "Item" in resp:
|
||||
return json.loads(resp["Item"]["result"]) # cached — skip all S3 work
|
||||
|
||||
# ... do the work ...
|
||||
|
||||
# at the end, before returning:
|
||||
_table.put_item(Item={
|
||||
"id": dedup_key,
|
||||
"result": json.dumps(result),
|
||||
"ttl": int(time.time()) + 86400,
|
||||
})
|
||||
```
|
||||
|
||||
The dedup key is derived from the logical job identity (`bucket + prefix`), not the `request_id`. Using `request_id` would only guard against Lambda's own retries of the same invocation; using the business key guards against a caller submitting the same job twice.
|
||||
|
||||
AWS PowerTools for Lambda has a built-in `@idempotent` decorator that implements this exact pattern, including TTL management and in-progress locking.
|
||||
|
||||
**What it requires:** a DynamoDB table, `dynamodb:GetItem` + `dynamodb:PutItem` IAM permissions on the execution role, and the PowerTools layer (or `aws-lambda-powertools` in requirements).
|
||||
|
||||
### Manifest lifecycle rule
|
||||
|
||||
Every invocation writes a new object under `manifests/`. Without cleanup, this prefix grows unbounded. The fix is an S3 lifecycle rule on the bucket — not a handler change:
|
||||
|
||||
```json
|
||||
{
|
||||
"Rules": [{
|
||||
"ID": "expire-manifests",
|
||||
"Filter": { "Prefix": "manifests/" },
|
||||
"Status": "Enabled",
|
||||
"Expiration": { "Days": 1 }
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
Objects under `manifests/` are deleted by S3 automatically after 1 day. The presigned URLs in those manifests are already short-lived (15 minutes by default), so there's no reason to keep the manifest longer than the URL validity window.
|
||||
|
||||
**What it requires:** a `PutBucketLifecycleConfiguration` call during infrastructure provisioning (CDK/Terraform/console) — nothing in the handler.
|
||||
|
||||
### Idempotency + manifest lifecycle together
|
||||
|
||||
With both in place: a retry returns the same `manifest_url` pointing to the same (still-live) manifest object; after 24 hours the manifest is gone and the dedup record has expired, so the next invocation starts fresh. The combination is clean.
|
||||
|
||||
### ReportBatchItemFailures (SQS only)
|
||||
|
||||
If this function were triggered by an SQS event source mapping (one message = one `(bucket, prefix)` job), the consumer-level `errors` field isn't enough — Lambda needs to know *which SQS messages* failed so it can re-queue only those. Return a `batchItemFailures` list instead of raising:
|
||||
|
||||
```python
|
||||
def handler(event, context):
|
||||
failures = []
|
||||
for record in event["Records"]:
|
||||
body = json.loads(record["body"])
|
||||
try:
|
||||
result = asyncio.run(_run(body, context.aws_request_id))
|
||||
except Exception as exc:
|
||||
failures.append({"itemIdentifier": record["messageId"]})
|
||||
return {"batchItemFailures": failures}
|
||||
```
|
||||
|
||||
Without this, a single failed message causes the entire batch to retry, including messages that succeeded — work is repeated and the queue can stall on a poison-pill message indefinitely.
|
||||
|
||||
**What it requires:** `ReportBatchItemFailures` enabled on the ESM configuration (CDK/Terraform) and restructuring the handler to iterate over `event["Records"]`. Not applicable to direct (RequestResponse) invocations like the local tester uses.
|
||||
|
||||
### arm64 / Graviton2
|
||||
|
||||
No code change needed. Switch the function's architecture to `arm64` in the deployment config:
|
||||
|
||||
```yaml
|
||||
# SAM
|
||||
Architectures: [arm64]
|
||||
```
|
||||
|
||||
Graviton2 costs ~20% less per GB-second and typically runs the init phase ~10% faster. The only blocker is native-code wheels: `aiobotocore` ships pure Python so there's no binary incompatibility here. Worth doing as a zero-effort cost and latency win.
|
||||
|
||||
Reference in New Issue
Block a user