# Labs

> Hands-on walkthroughs that modify the existing app. Each mutates what you already have — no throw-away exercises.

## Lab 0 — Local sandbox (start here)

**Goal:** run the full stack locally against MinIO with real PDFs.

1. `make install` — creates `.venv` and installs deps
2. `make up` — starts MinIO on :9000 (API) and :9001 (console)
3. `SOURCE_DIR=~/path/to/pdfs make seed` — uploads PDFs to MinIO bucket
4. `make invoke` — runs `invoke.py` which calls `handler()` with a minimal event
5. Open `http://localhost:9001` (minioadmin/minioadmin) and find the generated manifest in the `manifests/` prefix

**What you can break:** set `PREFIX` to a non-existent prefix and observe the handler returns count=0. Set `QUEUE_MAX=1` and observe the backpressure on the producer. Remove `S3_ENDPOINT_URL` and watch it fail to connect.

## Lab 1 — Deploy to real AWS

**Goal:** package and deploy the function to AWS Lambda, invoke it against a real S3 bucket.

1. Create an S3 bucket and upload sample PDFs to `2026/04/` prefix
2. Create an IAM execution role with `s3:GetObject`, `s3:PutObject`, `s3:ListBucket`, and `logs:*`
3. Build the deployment zip inside the Lambda image:
   `docker run --rm -v $PWD:/var/task public.ecr.aws/lambda/python:3.13 pip install -r requirements.txt -t package/`
4. Create the function: `aws lambda create-function --handler lambda_function.handler …`
5. Invoke: `aws lambda invoke --function-name pdf-scanner --payload '{}' out.json`
6. Verify the manifest appeared in S3 and the presigned URL works

**What you can break:** invoke without `s3:ListBucket` on the bucket (not the object ARN) — observe AccessDenied. Watch CloudTrail to see the denied call.

## Lab 2 — Add an S3 trigger

**Goal:** make the function fire automatically when a PDF is uploaded.

1. Add a resource policy entry granting S3 `lambda:InvokeFunction`
2. Configure an S3 event notification on the bucket for `s3:ObjectCreated:*` filtered to `*.pdf`
3. Upload a PDF and check CloudWatch Logs for the invocation
4. Notice the event structure differs from the manual invoke — update the handler to extract the key from `event["Records"][0]["s3"]["object"]["key"]`

**What you can break:** upload a non-PDF to the same prefix and verify the filter prevents invocation. Remove the resource policy and verify the trigger silently stops firing (no error to the uploader — this is the async invocation model).

## Lab 3 — Switch to arm64

**Goal:** migrate to Graviton2 and verify 20% cost reduction.

1. Rebuild the zip using the arm64 Lambda image: `public.ecr.aws/lambda/python:3.13-arm64`
2. Update the function architecture: `aws lambda update-function-configuration --architectures arm64`
3. Update the function code with the arm64 zip
4. Invoke and compare REPORT duration and billed duration in CloudWatch

**What you can break:** try deploying the x86 zip against the arm64 architecture — the function will import-error on any C-extension wheels.

## Lab 4 — Enable Provisioned Concurrency

**Goal:** eliminate cold starts on the production alias.

1. Publish version 1: `aws lambda publish-version --function-name pdf-scanner`
2. Create alias `prod` pointing to version 1
3. Enable PC: `aws lambda put-provisioned-concurrency-config --function-name pdf-scanner --qualifier prod --provisioned-concurrent-executions 2`
4. Invoke via the alias ARN and confirm `Init Duration` is absent from REPORT lines
5. Check your AWS bill after 1 hour — note the PC charges

## Lab 5 — Add X-Ray tracing

**Goal:** see a trace with S3 subsegments in the X-Ray console.

1. Add `aws-xray-sdk` to `requirements.txt` and rebuild the zip
2. Add to `lambda_function.py`: `from aws_xray_sdk.core import patch_all; patch_all()`
3. Enable active tracing on the function and add X-Ray permissions to the execution role
4. Invoke and open X-Ray → Traces in the console — verify S3 `list_objects_v2` and `generate_presigned_url` appear as subsegments

## Lab 6 — Fan out with Step Functions

**Goal:** process multiple S3 prefixes in parallel using a Map state.

1. Update the handler to accept a `prefix` key in the event (instead of reading from env var)
2. Create a Step Functions state machine with a Map state that iterates over a list of prefixes and invokes the Lambda for each
3. Start an execution with input: `{"prefixes": ["2026/01/", "2026/02/", "2026/03/"]}`
4. Observe parallel Lambda invocations in the execution graph and CloudWatch
5. Add error handling: configure the Map state to catch Lambda errors and continue rather than fail the whole execution