4.5 KiB
Labs
Hands-on walkthroughs that modify the existing app. Each mutates what you already have — no throw-away exercises.
Lab 0 — Local sandbox (start here)
Goal: run the full stack locally against MinIO with real PDFs.
make install— creates.venvand installs depsmake up— starts MinIO on :9000 (API) and :9001 (console)SOURCE_DIR=~/path/to/pdfs make seed— uploads PDFs to MinIO bucketmake invoke— runsinvoke.pywhich callshandler()with a minimal event- Open
http://localhost:9001(minioadmin/minioadmin) and find the generated manifest in themanifests/prefix
What you can break: set PREFIX to a non-existent prefix and observe the handler returns count=0. Set QUEUE_MAX=1 and observe the backpressure on the producer. Remove S3_ENDPOINT_URL and watch it fail to connect.
Lab 1 — Deploy to real AWS
Goal: package and deploy the function to AWS Lambda, invoke it against a real S3 bucket.
- Create an S3 bucket and upload sample PDFs to
2026/04/prefix - Create an IAM execution role with
s3:GetObject,s3:PutObject,s3:ListBucket, andlogs:* - Build the deployment zip inside the Lambda image:
docker run --rm -v $PWD:/var/task public.ecr.aws/lambda/python:3.13 pip install -r requirements.txt -t package/ - Create the function:
aws lambda create-function --handler lambda_function.handler … - Invoke:
aws lambda invoke --function-name pdf-scanner --payload '{}' out.json - Verify the manifest appeared in S3 and the presigned URL works
What you can break: invoke without s3:ListBucket on the bucket (not the object ARN) — observe AccessDenied. Watch CloudTrail to see the denied call.
Lab 2 — Add an S3 trigger
Goal: make the function fire automatically when a PDF is uploaded.
- Add a resource policy entry granting S3
lambda:InvokeFunction - Configure an S3 event notification on the bucket for
s3:ObjectCreated:*filtered to*.pdf - Upload a PDF and check CloudWatch Logs for the invocation
- Notice the event structure differs from the manual invoke — update the handler to extract the key from
event["Records"][0]["s3"]["object"]["key"]
What you can break: upload a non-PDF to the same prefix and verify the filter prevents invocation. Remove the resource policy and verify the trigger silently stops firing (no error to the uploader — this is the async invocation model).
Lab 3 — Switch to arm64
Goal: migrate to Graviton2 and verify 20% cost reduction.
- Rebuild the zip using the arm64 Lambda image:
public.ecr.aws/lambda/python:3.13-arm64 - Update the function architecture:
aws lambda update-function-configuration --architectures arm64 - Update the function code with the arm64 zip
- Invoke and compare REPORT duration and billed duration in CloudWatch
What you can break: try deploying the x86 zip against the arm64 architecture — the function will import-error on any C-extension wheels.
Lab 4 — Enable Provisioned Concurrency
Goal: eliminate cold starts on the production alias.
- Publish version 1:
aws lambda publish-version --function-name pdf-scanner - Create alias
prodpointing to version 1 - Enable PC:
aws lambda put-provisioned-concurrency-config --function-name pdf-scanner --qualifier prod --provisioned-concurrent-executions 2 - Invoke via the alias ARN and confirm
Init Durationis absent from REPORT lines - Check your AWS bill after 1 hour — note the PC charges
Lab 5 — Add X-Ray tracing
Goal: see a trace with S3 subsegments in the X-Ray console.
- Add
aws-xray-sdktorequirements.txtand rebuild the zip - Add to
lambda_function.py:from aws_xray_sdk.core import patch_all; patch_all() - Enable active tracing on the function and add X-Ray permissions to the execution role
- Invoke and open X-Ray → Traces in the console — verify S3
list_objects_v2andgenerate_presigned_urlappear as subsegments
Lab 6 — Fan out with Step Functions
Goal: process multiple S3 prefixes in parallel using a Map state.
- Update the handler to accept a
prefixkey in the event (instead of reading from env var) - Create a Step Functions state machine with a Map state that iterates over a list of prefixes and invokes the Lambda for each
- Start an execution with input:
{"prefixes": ["2026/01/", "2026/02/", "2026/03/"]} - Observe parallel Lambda invocations in the execution graph and CloudWatch
- Add error handling: configure the Map state to catch Lambda errors and continue rather than fail the whole execution