# Labs > Hands-on walkthroughs that modify the existing app. Each mutates what you already have — no throw-away exercises. ## Lab 0 — Local sandbox (start here) **Goal:** run the full stack locally against MinIO with real PDFs. 1. `make install` — creates `.venv` and installs deps 2. `make up` — starts MinIO on :9000 (API) and :9001 (console) 3. `SOURCE_DIR=~/path/to/pdfs make seed` — uploads PDFs to MinIO bucket 4. `make invoke` — runs `invoke.py` which calls `handler()` with a minimal event 5. Open `http://localhost:9001` (minioadmin/minioadmin) and find the generated manifest in the `manifests/` prefix **What you can break:** set `PREFIX` to a non-existent prefix and observe the handler returns count=0. Set `QUEUE_MAX=1` and observe the backpressure on the producer. Remove `S3_ENDPOINT_URL` and watch it fail to connect. ## Lab 1 — Deploy to real AWS **Goal:** package and deploy the function to AWS Lambda, invoke it against a real S3 bucket. 1. Create an S3 bucket and upload sample PDFs to `2026/04/` prefix 2. Create an IAM execution role with `s3:GetObject`, `s3:PutObject`, `s3:ListBucket`, and `logs:*` 3. Build the deployment zip inside the Lambda image: `docker run --rm -v $PWD:/var/task public.ecr.aws/lambda/python:3.13 pip install -r requirements.txt -t package/` 4. Create the function: `aws lambda create-function --handler lambda_function.handler …` 5. Invoke: `aws lambda invoke --function-name pdf-scanner --payload '{}' out.json` 6. Verify the manifest appeared in S3 and the presigned URL works **What you can break:** invoke without `s3:ListBucket` on the bucket (not the object ARN) — observe AccessDenied. Watch CloudTrail to see the denied call. ## Lab 2 — Add an S3 trigger **Goal:** make the function fire automatically when a PDF is uploaded. 1. Add a resource policy entry granting S3 `lambda:InvokeFunction` 2. Configure an S3 event notification on the bucket for `s3:ObjectCreated:*` filtered to `*.pdf` 3. Upload a PDF and check CloudWatch Logs for the invocation 4. Notice the event structure differs from the manual invoke — update the handler to extract the key from `event["Records"][0]["s3"]["object"]["key"]` **What you can break:** upload a non-PDF to the same prefix and verify the filter prevents invocation. Remove the resource policy and verify the trigger silently stops firing (no error to the uploader — this is the async invocation model). ## Lab 3 — Switch to arm64 **Goal:** migrate to Graviton2 and verify 20% cost reduction. 1. Rebuild the zip using the arm64 Lambda image: `public.ecr.aws/lambda/python:3.13-arm64` 2. Update the function architecture: `aws lambda update-function-configuration --architectures arm64` 3. Update the function code with the arm64 zip 4. Invoke and compare REPORT duration and billed duration in CloudWatch **What you can break:** try deploying the x86 zip against the arm64 architecture — the function will import-error on any C-extension wheels. ## Lab 4 — Enable Provisioned Concurrency **Goal:** eliminate cold starts on the production alias. 1. Publish version 1: `aws lambda publish-version --function-name pdf-scanner` 2. Create alias `prod` pointing to version 1 3. Enable PC: `aws lambda put-provisioned-concurrency-config --function-name pdf-scanner --qualifier prod --provisioned-concurrent-executions 2` 4. Invoke via the alias ARN and confirm `Init Duration` is absent from REPORT lines 5. Check your AWS bill after 1 hour — note the PC charges ## Lab 5 — Add X-Ray tracing **Goal:** see a trace with S3 subsegments in the X-Ray console. 1. Add `aws-xray-sdk` to `requirements.txt` and rebuild the zip 2. Add to `lambda_function.py`: `from aws_xray_sdk.core import patch_all; patch_all()` 3. Enable active tracing on the function and add X-Ray permissions to the execution role 4. Invoke and open X-Ray → Traces in the console — verify S3 `list_objects_v2` and `generate_presigned_url` appear as subsegments ## Lab 6 — Fan out with Step Functions **Goal:** process multiple S3 prefixes in parallel using a Map state. 1. Update the handler to accept a `prefix` key in the event (instead of reading from env var) 2. Create a Step Functions state machine with a Map state that iterates over a list of prefixes and invokes the Lambda for each 3. Start an execution with input: `{"prefixes": ["2026/01/", "2026/02/", "2026/03/"]}` 4. Observe parallel Lambda invocations in the execution graph and CloudWatch 5. Add error handling: configure the Map state to catch Lambda errors and continue rather than fail the whole execution