Currently operating as a brainstorming page
Request Routing
Request Processing Flow (both)- Request Validation: OpenAPI validation middleware validates request structure
- Session Selection: AISessionManager selects appropriate orchestrator based on model capability
- Payment Processing: Calculates payment based on pixel count for non-live endpoints
- Model Execution: Sends request to AI worker with specified model
Request Processing Flow
Scroll to pan
Transcoding Requests
Traditional video transcoding requests are handled through:- RTMP ingest: Port
1935by default - HTTP push:
/live/{streamKey}endpoint when-httpIngestis enabled - HLS output: Adaptive bitrate streams for playback
AI Requests
AI processing requests are routed through dedicated endpoints ai_mediaserver.go (fixme) OpenAPI Spec is here: ai/worker/api/openapi.json
Generate images from text prompts.
Uses
jsonDecoder for parsingTransform images with prompts.
Uses
multipartDecoder for file uploadsCreate videos from images.
Uses
multipartDecoder for file uploadsUpscale (enhance) images to higher resolution.
Uses
multipartDecoder for file uploadsApply transformations to a live video streamed to the returned endpoints.
Live video endpoint has specialized handling for real-time streaming with MediaMTX integration
Payment Models
The dual setup handles two different payment models:Transcoding Payments
Basis: Per video segment processed Method: Payment tickets sent with each segment Verification: Multi-orchestrator verification for quality assuranceAI Payments
Basis: Per pixel processed (width × height × outputs) Method: Pixel-based payment calculation Live Video: Interval-based payments during streamingOperational Considerations
Resource Allocation
When running dual setup, consider:- GPU resources: Shared between transcoding and AI workloads
- Memory: AI models require significant RAM when loaded (“warm”)
- Network: Bandwidth for both stream ingest and AI request/response
Monitoring
Monitor both workload types:- Transcoding: Segment processing latency, success rates
- AI: Model loading times, inference latency, pixel processing rates
Scaling Strategies
- Horizontal: Deploy multiple gateway instances behind a load balancer
- Vertical: Allocate more GPU resources for AI model parallelism
- Specialized: Separate nodes for transcoding vs AI based on workload patterns