Routing to LLM Providers
Introduction
Envoy AI Gateway can front external LLM providers behind one OpenAI-compatible endpoint. It injects each provider's upstream credentials with a BackendSecurityPolicy, routes by model name, and fails over between providers. Consumers call a single internal address and never hold provider keys, so the gateway becomes the controlled egress point for public-cloud LLM traffic, with the same identity, quota, and metering applied to external models as to self-hosted ones.
Use Cases
- Expose a hosted model, such as one from OpenAI, AWS Bedrock, Azure OpenAI, GCP Vertex AI, or Anthropic, without distributing the provider key.
- Route different model names to different providers behind one endpoint.
- Fail over from a primary provider to a backup when the primary is unavailable.
Prerequisites
-
Envoy AI Gateway is installed, with a
Gatewayand anAIGatewayRoute. Confirm the relevant CRDs are present: -
The upstream provider credential (created in the next section) is stored in a
Secretin the route namespace. -
The provider endpoint is reachable from the cluster egress. Verify before going further:
Create the Gateway and AIGatewayRoute in a dedicated namespace (for example maas-system), not in the Envoy Gateway control-plane namespace envoy-gateway-system. A gateway placed in the control-plane namespace may not have the AI Gateway request-processing filter and SecurityPolicy applied to its listener, which silently breaks routing and policy enforcement. See Envoy AI Gateway.
Steps
Store the upstream credential
Create the Secret that holds the provider API key. For type: APIKey the data-map key must be exactly apiKey — the BackendSecurityPolicy looks up that field by name, so a --from-literal=key=... will leave the upstream call unauthenticated even though the policy reports Accepted:
Inject the provider credential with a BackendSecurityPolicy that targets the backend. The type field selects the provider authentication scheme.
The type field accepts APIKey, AWSCredentials, AzureAPIKey, AzureCredentials, GCPCredentials, and AnthropicAPIKey. Each type expects a matching credential block and a matching set of Secret data keys:
When the upstream auth scheme is wrong, the upstream typically returns 401/403. When the Secret is keyed wrongly (for example key: instead of apiKey:) the failure mode is harder to read: the BackendSecurityPolicy still reports Accepted=True, but the controller logs failed to get backend auth from backend security policy. Skipping this backend. ... error: secret <name> does not contain key apiKey and removes the backend from the route, so requests to that backend time out rather than returning a clean 401. Tail the controller logs when introducing a new credential:
Define the provider backend
Two resources work together: a Backend (Envoy Gateway) tells the data plane the network endpoint to reach, and an AIServiceBackend (Envoy AI Gateway) tells the AI filter the provider schema to translate to.
First, declare the upstream endpoint as a Backend:
Then register the provider as an AIServiceBackend referencing that Backend:
schema.name: the upstream protocol the gateway must speak. Common values:OpenAI,AWSBedrock,AzureOpenAI,GCPVertexAI,Anthropic. The gateway transcodes the incoming OpenAI-compatible request to this schema before forwarding.backendRef: must point at aBackend(groupgateway.envoyproxy.io), not aService— the AI filter relies on theBackendfor FQDN + TLS handling to public endpoints.
Confirm both resources reconciled:
Route by model with fallback
Reference multiple backends in one AIGatewayRoute rule and set priority so the gateway fails over from the primary to the backup:
matches.headers[x-ai-eg-model]: the AI filter parses themodelfield from the request body and writes it to this header for routing. So"model":"gpt-4o"in the request is what reaches this match — no manual header is required from the caller.priority: Envoy uses locality-weighted load balancing. Priority-0 endpoints take all traffic while at least one is healthy; the priority-1 group only receives traffic once every priority-0 endpoint trips its outlier detection. Failover is automatic but takes seconds, not milliseconds; do not rely on it for tail-latency budgets.
Verification
Send an OpenAI-compatible request to the gateway and confirm it reaches the provider, the client's Authorization header is replaced by the BSP-injected key, and a valid response with token usage comes back:
A successful response (200 OK with a usage object) means the upstream credential was injected and the route resolved. Inspect the request that actually hit the upstream by enabling Envoy's access log on the EnvoyProxy resource, or temporarily route to a debug echo backend; the upstream-bound Authorization header should carry the value from the openai-key Secret, not the bogus client value above.
To exercise failover, simulate a primary outage by pointing the primary Backend at an unreachable host (hostname: invalid.example.invalid) for a few seconds and watch traffic shift to the backup; the response body's model field will reflect the new provider.