media-analyzer/def/improvements/moving_file_watcher_events

 Two issues here:

  1. Container isolation problem - File-watcher creates temp directory in its container, but
  celery-logo can't access it (different containers)
  2. Strategy logging spam - "Configured execution strategy: local" logs on every task

  Let me fix both:

  Fix 1: Use shared volume instead of container-local temp directory

● Update(media-analyzer/backend/streaming/file_watcher.py)
  ⎿  User rejected update to media-analyzer/backend/streaming/file_watcher.py
       18            self.processed_files = set()
       19            self.analyzer = VideoAnalyzer()
       20
       21 -          # Create a persistent directory for analysis segments
       21 +          # Create a persistent directory for analysis segments in shared volume
       22            self.analysis_dir = self.media_dir / 'analysis_segments'
       23            try:
       24 -              self.analysis_dir.mkdir(exist_ok=True)
       24 +              self.analysis_dir.mkdir(exist_ok=True, mode=0o755)
       25 +              logger.info(f"File watcher: Using analysis directory: {self.analysis_dir}")
       26            except PermissionError:
       27 -              logger.warning(f"File watcher: Cannot create {self.analysis_dir}, using temp directory")
       27 +              logger.warning(f"File watcher: Cannot create {self.analysis_dir}, using shared temp directory")
       28 +              # Use temp directory within the shared volume so celery can access it
       29                import tempfile
       30 -              self.analysis_dir = Path(tempfile.mkdtemp(prefix='analysis_segments_'))
       30 +              self.analysis_dir = Path(tempfile.mkdtemp(dir=self.media_dir, prefix='analysis_'))
       31                logger.info(f"File watcher: Using temporary analysis directory: {self.analysis_dir}")
       32
       33            # Track copied files for cleanup

> how this archicheture desing would scale when moving to k8s or hybrid deployments

● Excellent question! The current architecture has scaling challenges that need addressing for K8s/hybrid deployments:

  Current Architecture Limitations

  Single Point of Failure:
  File-watcher (1 instance) → Temp Directory → Multiple Celery Workers
  - Only one file-watcher instance can run
  - Temp directory is container-local (doesn't work across pods)
  - No coordination between file-watchers if scaled

  K8s-Ready Architecture Design

  Option 1: Event-Driven with Message Queues

  RTMP/HLS → File Events → Redis Streams → Celery Tasks → Analysis Results

  Implementation:
  - File events go directly to Redis/RabbitMQ instead of filesystem watching
  - Multiple file-watcher pods can process different streams
  - Celery workers scale independently across nodes
  - No shared filesystem dependency

  Option 2: Object Storage + Event Notifications

  RTMP → Cloud Storage → Storage Events → Lambda/K8s Jobs → Analysis

  GCP Implementation:
  # segments uploaded to GCS trigger Cloud Functions
  # Cloud Functions push tasks to Cloud Tasks/Pub/Sub
  # Celery workers pull from Pub/Sub instead of file watching

  Option 3: StatefulSet + Persistent Volumes

  File-watcher StatefulSet → PVC → Celery Deployment

  Current Architecture Enhanced:
  - File-watcher as StatefulSet with persistent volume
  - Analysis segments on shared PVC (ReadWriteMany)
  - Multiple celery pods access same PVC

  Hybrid Deployment Patterns

  Local Development

  docker compose up  # Current setup works fine

  Single VM K8s (k3s/kind)

  # Use hostPath volumes
  volumes:
    - name: media-storage
      hostPath:
        path: /opt/media-analyzer/segments

  Cloud K8s (GKE/EKS)

  # Use cloud storage + events
  env:
    - name: USE_CLOUD_STORAGE
      value: "true"
    - name: GCS_BUCKET
      value: "media-analyzer-segments"

  Recommended Migration Path

  Phase 1: Make Current Architecture K8s-Compatible

  # Add environment-aware file watching
  if os.getenv('KUBERNETES_SERVICE_HOST'):
      # Use shared PVC path
      self.analysis_dir = Path('/shared-storage/analysis_segments')
  else:
      # Use local docker volume
      self.analysis_dir = self.media_dir / 'analysis_segments'

  Phase 2: Event-Driven Refactor

  # Replace file system events with message events
  class VideoSegmentProcessor:
      def handle_new_segment(self, segment_event):
          # segment_event contains: bucket, path, stream_key
          # No file copying needed - process from object storage

  Phase 3: Full Cloud-Native

  # Kubernetes CronJob for cleanup instead of in-process cleanup
  # Horizontal Pod Autoscaler based on queue depth
  # Cloud storage for persistence
  # Cloud events for coordination

  Performance & Scaling Considerations

  Current Bottlenecks:
  - Single file-watcher instance
  - File copying overhead
  - Shared filesystem dependency

  K8s Solutions:
  - HPA: Scale celery workers based on Redis queue length
  - Multiple file-watchers: One per stream or region
  - Node affinity: Co-locate file-watcher with RTMP pods
  - Resource quotas: Prevent memory leaks from affecting cluster

  Configuration Strategy

  # Environment-aware configuration
  class ScalingConfig:
      def __init__(self):
          self.deployment_type = self.detect_environment()

      def detect_environment(self):
          if os.getenv('KUBERNETES_SERVICE_HOST'):
              if os.getenv('GCP_PROJECT_ID'):
                  return 'gke'
              return 'k8s'
          return 'docker'

      def get_storage_backend(self):
          return {
              'docker': LocalFileSystem(),
              'k8s': SharedVolume(),
              'gke': CloudStorage()
          }[self.deployment_type]

  The current architecture works well for Docker Compose but needs the event-driven refactor for true K8s scalability. The hybrid approach lets you migrate incrementally while maintaining backward
  compatibility.