Inference Gateway
Coming soonOne API for many models. Built for production.
What it does
Inference Gateway sits in front of your Inference Endpoints and adds request routing across models, per-key rate limiting, usage logs, and a uniform API across different model runtimes. Swap LLaMA-3.1 for Mistral without touching client code.
Built for teams that have moved past 'one model, one endpoint' and want production-grade observability on their AI traffic. Compatible with the OpenAI Chat Completions API. Logs every request to Object Storage.
Why Chess Cloud
- Route requests across multiple models from one API
- Per-key rate limiting and usage logging
- Compatible with OpenAI Chat Completions API
- Swap models without touching client code
Coming soon
Inference Gateway is not available yet. Leave your email and we'll notify you when it launches.