implement robust event-driven file watching system

2025-08-25 04:03:56 -03:00
parent cdab5191e4
commit dd3e9769a5
16 changed files with 11808 additions and 421 deletions
--- a/def/improvements/moving_file_watcher_events
+++ b/def/improvements/moving_file_watcher_events
@@ -0,0 +1,161 @@
+ Two issues here:
+
+  1. Container isolation problem - File-watcher creates temp directory in its container, but
+  celery-logo can't access it (different containers)
+  2. Strategy logging spam - "Configured execution strategy: local" logs on every task
+
+  Let me fix both:
+
+  Fix 1: Use shared volume instead of container-local temp directory
+
+● Update(media-analyzer/backend/streaming/file_watcher.py)
+  ⎿  User rejected update to media-analyzer/backend/streaming/file_watcher.py
+       18            self.processed_files = set()
+       19            self.analyzer = VideoAnalyzer()
+       20            
+       21 -          # Create a persistent directory for analysis segments
+       21 +          # Create a persistent directory for analysis segments in shared volume
+       22            self.analysis_dir = self.media_dir / 'analysis_segments'
+       23            try:
+       24 -              self.analysis_dir.mkdir(exist_ok=True)
+       24 +              self.analysis_dir.mkdir(exist_ok=True, mode=0o755)
+       25 +              logger.info(f"File watcher: Using analysis directory: {self.analysis_dir}")
+       26            except PermissionError:
+       27 -              logger.warning(f"File watcher: Cannot create {self.analysis_dir}, using temp directory")
+       27 +              logger.warning(f"File watcher: Cannot create {self.analysis_dir}, using shared temp directory")
+       28 +              # Use temp directory within the shared volume so celery can access it
+       29                import tempfile
+       30 -              self.analysis_dir = Path(tempfile.mkdtemp(prefix='analysis_segments_'))
+       30 +              self.analysis_dir = Path(tempfile.mkdtemp(dir=self.media_dir, prefix='analysis_'))
+       31                logger.info(f"File watcher: Using temporary analysis directory: {self.analysis_dir}")
+       32            
+       33            # Track copied files for cleanup
+
+> how this archicheture desing would scale when moving to k8s or hybrid deployments
+
+● Excellent question! The current architecture has scaling challenges that need addressing for K8s/hybrid deployments:
+
+  Current Architecture Limitations
+
+  Single Point of Failure:
+  File-watcher (1 instance) → Temp Directory → Multiple Celery Workers
+  - Only one file-watcher instance can run
+  - Temp directory is container-local (doesn't work across pods)
+  - No coordination between file-watchers if scaled
+
+  K8s-Ready Architecture Design
+
+  Option 1: Event-Driven with Message Queues
+
+  RTMP/HLS → File Events → Redis Streams → Celery Tasks → Analysis Results
+
+  Implementation:
+  - File events go directly to Redis/RabbitMQ instead of filesystem watching
+  - Multiple file-watcher pods can process different streams
+  - Celery workers scale independently across nodes
+  - No shared filesystem dependency
+
+  Option 2: Object Storage + Event Notifications
+
+  RTMP → Cloud Storage → Storage Events → Lambda/K8s Jobs → Analysis
+
+  GCP Implementation:
+  # segments uploaded to GCS trigger Cloud Functions
+  # Cloud Functions push tasks to Cloud Tasks/Pub/Sub
+  # Celery workers pull from Pub/Sub instead of file watching
+
+  Option 3: StatefulSet + Persistent Volumes
+
+  File-watcher StatefulSet → PVC → Celery Deployment
+
+  Current Architecture Enhanced:
+  - File-watcher as StatefulSet with persistent volume
+  - Analysis segments on shared PVC (ReadWriteMany)
+  - Multiple celery pods access same PVC
+
+  Hybrid Deployment Patterns
+
+  Local Development
+
+  docker compose up  # Current setup works fine
+
+  Single VM K8s (k3s/kind)
+
+  # Use hostPath volumes
+  volumes:
+    - name: media-storage
+      hostPath:
+        path: /opt/media-analyzer/segments
+
+  Cloud K8s (GKE/EKS)
+
+  # Use cloud storage + events
+  env:
+    - name: USE_CLOUD_STORAGE
+      value: "true"
+    - name: GCS_BUCKET
+      value: "media-analyzer-segments"
+
+  Recommended Migration Path
+
+  Phase 1: Make Current Architecture K8s-Compatible
+
+  # Add environment-aware file watching
+  if os.getenv('KUBERNETES_SERVICE_HOST'):
+      # Use shared PVC path
+      self.analysis_dir = Path('/shared-storage/analysis_segments')
+  else:
+      # Use local docker volume
+      self.analysis_dir = self.media_dir / 'analysis_segments'
+
+  Phase 2: Event-Driven Refactor
+
+  # Replace file system events with message events
+  class VideoSegmentProcessor:
+      def handle_new_segment(self, segment_event):
+          # segment_event contains: bucket, path, stream_key
+          # No file copying needed - process from object storage
+
+  Phase 3: Full Cloud-Native
+
+  # Kubernetes CronJob for cleanup instead of in-process cleanup
+  # Horizontal Pod Autoscaler based on queue depth
+  # Cloud storage for persistence
+  # Cloud events for coordination
+
+  Performance & Scaling Considerations
+
+  Current Bottlenecks:
+  - Single file-watcher instance
+  - File copying overhead
+  - Shared filesystem dependency
+
+  K8s Solutions:
+  - HPA: Scale celery workers based on Redis queue length
+  - Multiple file-watchers: One per stream or region
+  - Node affinity: Co-locate file-watcher with RTMP pods
+  - Resource quotas: Prevent memory leaks from affecting cluster
+
+  Configuration Strategy
+
+  # Environment-aware configuration
+  class ScalingConfig:
+      def __init__(self):
+          self.deployment_type = self.detect_environment()
+
+      def detect_environment(self):
+          if os.getenv('KUBERNETES_SERVICE_HOST'):
+              if os.getenv('GCP_PROJECT_ID'):
+                  return 'gke'
+              return 'k8s'
+          return 'docker'
+
+      def get_storage_backend(self):
+          return {
+              'docker': LocalFileSystem(),
+              'k8s': SharedVolume(),
+              'gke': CloudStorage()
+          }[self.deployment_type]
+
+  The current architecture works well for Docker Compose but needs the event-driven refactor for true K8s scalability. The hybrid approach lets you migrate incrementally while maintaining backward
+  compatibility.