-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
The current deployment architecture assumes a specific routing/gateway setup. We should support multiple router/gateway options to match different production environments.
Acceptance Criteria
- Identify common router/gateway options (e.g., Istio, NGINX, Envoy, KServe predictor)
- Define configuration schema for router selection
- Create deployment templates for each router type
- Add UI option to select router/gateway during deployment
- Document routing architecture for each option
- Test with KV cache-aware routing (see item Add llm-d as deployment target alongside KServe/vLLM #11 in work items)
Notes
- Routing impacts latency, throughput, and KV cache efficiency
- Should support KV cache-aware routing for multi-replica deployments (llm-d feature)
Metadata
Metadata
Assignees
Labels
No labels