tentative Kubernetes deployment configuration

2025-08-25 04:00:42 -03:00
parent 622e8adb69
commit c7822c1ec4
87 changed files with 3661 additions and 145 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -409,7 +409,7 @@ This approach gives you:
 The planning already supports this perfectly - you develop fast locally and test orchestration when needed!


-CURRENT STEP 
+STEP DONE

 now that I have a basic stream pipeline a need to add an AI feature (something simple) before stepping everything to something that scales. I was thinking on detecting my extend hand, but maybe some more bussines oriented feature would work better (read the very first message with the JD in spanish to think about this)

@@ -504,4 +504,335 @@ This fits perfectly in AI Analysis → Local Single Server:
 CLIP logo classification (instead of YOLO objects)
 Analysis results storage
 Results API endpoint
-Want to start with basic CLIP brand classification, or go straight for the object detection + brand recognition combo?
+Want to start with basic CLIP brand classification, or go straight for the object detection + brand recognition combo?
+
+
+CURRENT STEP
+
+Demo Strategy
+For Interviewers:
+
+Show local demo: "Here's the system running locally with GCP integration"
+Explain cloud strategy: "This architecture scales to full GCP with these manifests"
+Demonstrate cloud services: "Logo detection uses Cloud Vision API"
+Show cost optimization: "Uses free tier effectively while maintaining production patterns"
+
+Scaling Path:
+Local K8s → Single VM K8s → GKE Autopilot → Full GKE
+Questions for You:
+
+Comfort with hybrid approach? Local K8s + GCP services vs pure cloud?
+Demo requirements? Do interviewers need to access it remotely, or is screen sharing sufficient?
+AI model preference? Keep local CLIP for demo, or switch to Cloud Vision for "production"?
+Time investment? How much time do you want to spend on infrastructure vs other features?
+
+The hybrid approach gives you the best of both worlds - demonstrates K8s and GCP knowledge without the costs!
+
+
+myself said: def what I prefer 2. would like to extend this not just this interview but as portfolio asset for others and potential people interested in learning how to do it 3. I reckon that design decisions allow for easy switching between the two. 4 today's focus is only infra (only had time for end to end logo detection, so shader effects are postponed indefinitely,  I separated the visual properties on a different celery task to put in different queues but is commented out to test just the logo detection end-to-end and that's the feature that must work in after completing this prod-ready phase)
+
+
+Perfect! Your hybrid approach as a portfolio/learning asset changes everything - much better than a throwaway demo. Let me design this for educational value and real-world applicability.
+Portfolio-Ready Architecture
+Design Principles:
+
+Teachable: Others can follow your setup
+Realistic: Mirrors actual production patterns
+Cost-effective: $0 monthly with scaling options
+Demonstrable: Works for interviews and learning
+
+Hybrid K8s + GCP Implementation
+Phase 1: Local Foundation (Today's Focus)
+┌──────────────────────────────────────────────────────────┐
+│                    Local KIND Cluster                    │
+├─────────────────┬─────────────────┬─────────────────────┤
+│   Django Pod    │   Worker Pod    │   Infrastructure    │
+│ • Web API       │ • Logo Analysis │ • PostgreSQL       │
+│ • Admin         │ • Visual Tasks  │ • Redis             │
+│ • WebSocket     │ • Celery        │ • NGINX             │
+└─────────────────┴─────────────────┴─────────────────────┘
+                           │
+                    ┌─────────────┐
+                    │ GCP Services│
+                    │ • Storage   │
+                    │ • Vision AI │
+                    │ • Logging   │
+                    └─────────────┘
+K8s Manifests Structure:
+k8s/
+├── base/                           # Platform-agnostic
+│   ├── django-deployment.yaml      # Web application
+│   ├── worker-deployment.yaml      # Celery workers
+│   ├── postgres-statefulset.yaml   # Database
+│   ├── redis-deployment.yaml       # Cache/Queue
+│   ├── nginx-deployment.yaml       # Reverse proxy
+│   └── services.yaml              # Internal networking
+├── overlays/
+│   ├── local/                     # KIND cluster
+│   │   ├── kustomization.yaml     # Local configs
+│   │   └── local-storage.yaml     # Local volumes
+│   ├── gcp-single-vm/             # Single VM deployment
+│   │   ├── kustomization.yaml     # VM-specific configs
+│   │   └── resource-limits.yaml   # Memory constraints
+│   └── gcp-production/            # Full GKE deployment
+│       ├── kustomization.yaml     # Production configs
+│       ├── hpa.yaml              # Auto-scaling
+│       └── monitoring.yaml        # Observability
+Today's Infrastructure Implementation
+1. GCP Project Setup (Free Tier)
+bash# Create project and enable APIs
+gcloud projects create media-analyzer-demo
+gcloud config set project media-analyzer-demo
+gcloud services enable vision.googleapis.com
+gcloud services enable storage.googleapis.com
+gcloud services enable logging.googleapis.com
+
+# Create service account
+gcloud iam service-accounts create media-analyzer \
+    --display-name="Media Analyzer Service Account"
+
+# Create and download key
+gcloud iam service-accounts keys create credentials.json \
+    --iam-account=media-analyzer@media-analyzer-demo.iam.gserviceaccount.com
+
+# Create storage bucket
+gsutil mb gs://media-analyzer-segments
+2. Local K8s with GCP Integration
+yaml# k8s/base/django-deployment.yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: django-app
+spec:
+  replicas: 2
+  template:
+    spec:
+      containers:
+      - name: django
+        image: media-analyzer-backend:latest
+        env:
+        - name: GOOGLE_APPLICATION_CREDENTIALS
+          value: "/var/secrets/google/credentials.json"
+        - name: GCP_BUCKET_NAME
+          value: "media-analyzer-segments"
+        - name: USE_CLOUD_VISION
+          value: "true"
+        volumeMounts:
+        - name: gcp-credentials
+          mountPath: /var/secrets/google
+          readOnly: true
+      volumes:
+      - name: gcp-credentials
+        secret:
+          secretName: gcp-credentials
+3. Environment Switching Design
+python# backend/media_analyzer/settings/base.py
+import os
+
+# AI Processing Configuration
+USE_CLOUD_VISION = os.getenv('USE_CLOUD_VISION', 'false').lower() == 'true'
+USE_LOCAL_CLIP = not USE_CLOUD_VISION
+
+# Storage Configuration  
+USE_CLOUD_STORAGE = os.getenv('USE_CLOUD_STORAGE', 'false').lower() == 'true'
+GCP_BUCKET_NAME = os.getenv('GCP_BUCKET_NAME', 'media-analyzer-segments')
+
+# AI Analysis Strategy
+if USE_CLOUD_VISION:
+    AI_ANALYSIS_BACKEND = 'ai_processing.backends.CloudVisionBackend'
+else:
+    AI_ANALYSIS_BACKEND = 'ai_processing.backends.LocalClipBackend'
+Portfolio Documentation Strategy
+README Structure:
+markdown# Media Analyzer - Production K8s Deployment
+
+## Architecture Options
+- [Local Development](docs/local-setup.md) - Docker Compose
+- [Hybrid K8s](docs/hybrid-k8s.md) - KIND + GCP Services  
+- [Single VM](docs/single-vm-k8s.md) - k3s on GCP e2-micro
+- [Full Production](docs/production-gke.md) - GKE with auto-scaling
+
+## Cost Analysis
+- Local: $0/month
+- Hybrid: $0/month (free tier)
+- Single VM: $0/month (free tier e2-micro)
+- Production: ~$150/month (full GKE)
+
+## Features Demonstrated
+- ✅ Real-time video streaming (RTMP → HLS)
+- ✅ AI-powered logo detection (CLIP/Cloud Vision)
+- ✅ Kubernetes orchestration
+- ✅ Cloud-native patterns
+- ⏳ Auto-scaling and monitoring
+Deployment Commands for Today:
+bash# 1. Setup GCP credentials in K8s
+kubectl create secret generic gcp-credentials \
+  --from-file=credentials.json=./credentials.json
+
+# 2. Deploy to local KIND
+kubectl apply -k k8s/overlays/local
+
+# 3. Port forward for access
+kubectl port-forward service/nginx 8080:80
+
+# 4. Test logo detection
+curl http://localhost:8080/api/streams/
+Learning/Portfolio Value:
+For Others:
+
+Step-by-step deployment guides
+Cost optimization strategies
+Environment switching patterns
+Real-world K8s manifests
+
+For Interviews:
+
+"Here's how I optimized for $0 monthly cost"
+"This scales from laptop to production GKE"
+"Logo detection works with both local and cloud AI"
+"Others have successfully followed my setup guide"
+
+Today's Checklist:
+
+ GCP project setup with service account
+ K8s secrets for GCP credentials
+ Environment switching for AI backends
+ Local deployment with cloud services
+ Logo detection end-to-end test
+ Documentation for portfolio sharing
+
+Ready to tackle the infrastructure setup? This approach gives you a real portfolio asset that others can learn from while showcasing production-ready patterns!
+
+Your Infrastructure Focus Today:
+
+✅ Hybrid K8s (KIND) + GCP services architecture
+✅ Environment switching between local CLIP and Cloud Vision
+✅ Portfolio-ready documentation approach
+✅ $0 monthly cost with production scaling path
+
+Key Deliverable:
+Logo detection working end-to-end in production-ready K8s setup
+
+
+- non free GCP services should use local k8s services that are preferably easy switchable
+- dont add things to gitignore that you don't know if they'll ever be there, just add what's needed as we go
+
+
+
+
+
+docker check up
+
+Here’s a fast “what changed → what to do” map so you don’t nuke/prune on every edit.
+
+# Quick map
+
+| You changed…                                                                                                   | Do this (fastest first)                                                                         | Notes / good defaults                                                                                                         |
+| -------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
+| **App code** (source files)                                                                                    | Nothing if you have live‑reload. Otherwise `docker compose restart <svc>`                       | Use bind mounts + a dev runner: Node `nodemon`, Python `uvicorn --reload`/Flask debug, Django `runserver`, Go `air`/`reflex`. |
+| **Runtime config** inside container (env var values, flags)                                                    | `docker compose up -d --no-deps <svc>` (recreate) or `restart`                                  | If the env is in `.env` or `compose.yml` `environment:`, a restart/recreate is enough.                                        |
+| **Config files read at startup** (e.g., `nginx.conf`, app config YAML)                                         | `docker compose up -d --no-deps <svc>`                                                          | Bind-mount the config so you can `restart` instead of `rebuild`.                                                              |
+| **Dependency lockfile** (`package-lock.json`, `poetry.lock`, `requirements.txt`, `go.mod/sum`, `Gemfile.lock`) | `docker compose build <svc>` then `up -d --no-deps <svc>`                                       | Cache layers by copying lockfile before `COPY .`. See Dockerfile pattern below.                                               |
+| **Dockerfile** (but not the base image)                                                                        | `docker compose build <svc>` then `up -d --no-deps <svc>`                                       | BuildKit keeps layer cache; only changed layers rebuild.                                                                      |
+| **Base image tag** (e.g., `FROM node:20-bullseye` -> new tag or want latest security updates)                  | `docker compose build --pull <svc>` then `up -d --no-deps <svc>`                                | `--pull` refreshes the base. Use pinned tags in prod.                                                                         |
+| **Build args** (`ARG VAR=...` used in Dockerfile)                                                              | `docker compose build --no-cache --build-arg VAR=... <svc>` (if the arg affects earlier layers) | If the arg only affects late layers, drop `--no-cache`.                                                                       |
+| **Multi-service libraries** (shared package used by multiple services)                                         | Rebuild every consumer: `docker compose build svc1 svc2` then `up -d --no-deps svc1 svc2`       | Consider a shared base image stage to centralize caches.                                                                      |
+| **compose.yml** service definition (ports, volumes, healthchecks)                                              | `docker compose up -d`                                                                          | Compose detects what must be recreated.                                                                                       |
+| **External dependency** (DB schema, migrations)                                                                | Run migration container/task; usually no rebuild                                                | Keep DB in a **named volume** so rebuilds don’t wipe data.                                                                    |
+| **Static assets** (built by a toolchain)                                                                       | If built outside: restart only. If built inside: `build` that web service                       | Prefer building in a separate “builder” stage with a cache.                                                                   |
+| **Secrets** (files mounted via `secrets:` or env injected at runtime)                                          | `restart` the service                                                                           | Don’t bake secrets into images → no rebuilds needed.                                                                          |
+| **Data in bind/named volumes**                                                                                 | Nothing (data persists)                                                                         | Avoid pruning volumes unless you *want* to reset state.                                                                       |
+
+# Minimal dev patterns that avoid rebuilds
+
+**Dockerfile (Node/Python example)**
+
+```dockerfile
+# syntax=docker/dockerfile:1.7
+FROM node:20 AS deps
+WORKDIR /app
+COPY package*.json ./
+RUN npm ci
+
+FROM node:20 AS dev
+WORKDIR /app
+COPY --from=deps /app/node_modules ./node_modules
+# Copy only what’s needed for runtime; source comes from a bind mount in dev
+COPY package*.json ./
+CMD ["npm","run","dev"]  # e.g., vite/next with HMR
+```
+
+For Python, copy `pyproject.toml`/`requirements.txt` to a deps layer first, install, then bind‑mount your app code.
+
+**docker-compose.yml (dev)**
+
+```yaml
+services:
+  web:
+    build:
+      context: .
+      target: dev
+    volumes:
+      - .:/app:cached
+    env_file: .env
+    ports:
+      - "3000:3000"
+    command: npm run dev
+```
+
+# Everyday command palette
+
+* Restart a single service (pick up env/config):
+  `docker compose restart <svc>`
+* Recreate a service without touching deps (uses existing image):
+  `docker compose up -d --no-deps <svc>`
+* Rebuild only what changed (then recreate):
+  `docker compose build <svc> && docker compose up -d --no-deps <svc>`
+* Rebuild with fresh base image:
+  `docker compose build --pull <svc> && docker compose up -d --no-deps <svc>`
+* Tail logs:
+  `docker compose logs -f <svc>`
+* Exec a shell:
+  `docker compose exec <svc> sh` (or `bash`)
+* Clean **containers & images from dangling builds** (keep volumes!):
+  `docker image prune -f && docker container prune -f`
+* Clean everything **except named volumes**:
+  `docker system prune -a`  ← only when truly needed.
+
+# Tips that keep you off the prune button
+
+1. **Bind-mount code in dev** + a live‑reload command, so edits don’t require rebuilds.
+2. **Layer your Dockerfile** so deps install before copying the whole source:
+
+   * `COPY package*.json .` → install → **then** `COPY src .`
+3. **Use BuildKit caches** for heavy steps (node, pip, apt):
+
+   * `RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt`
+   * `RUN --mount=type=cache,target=/root/.npm npm ci`
+4. **Pin base images** (e.g., `node:20-bullseye`) and consciously use `--pull` when you want updates.
+5. **Separate build and runtime** (multi‑stage). Artifacts copied forward make rebuilds smaller.
+6. **Keep data in named volumes**; never in the image. Pruning images won’t touch your DB/files.
+7. **.dockerignore** aggressively: exclude `node_modules` (if you install in image), `.git`, build outputs, tmp.
+8. **Compose profiles** for optional services (e.g., `profiles: ["dev"]`) so you don’t restart the world.
+
+# Tiny Makefile (muscle memory)
+
+```make
+up:      ## start everything
+    docker compose up -d
+
+logs:    ## tail logs from app
+    docker compose logs -f web
+
+re:      ## rebuild + recreate app only
+    docker compose build web && docker compose up -d --no-deps web
+
+restart: ## restart app only
+    docker compose restart web
+```
+
+If you tell me your stack (language/framework + how you install deps), I’ll tailor the Dockerfile and compose snippets so most edits are “save → live‑reload,” lockfile changes are “build web,” and prunes are basically never.
+
+
+
+it's docker compose not docker-compose