Inference Gateway

Coming soon

One API for many models. Built for production.

What it does

Inference Gateway sits in front of your Inference Endpoints and adds request routing across models, per-key rate limiting, usage logs, and a uniform API across different model runtimes. Swap LLaMA-3.1 for Mistral without touching client code.

Built for teams that have moved past 'one model, one endpoint' and want production-grade observability on their AI traffic. Compatible with the OpenAI Chat Completions API. Logs every request to Object Storage.

Why Chess Cloud

  • Route requests across multiple models from one API
  • Per-key rate limiting and usage logging
  • Compatible with OpenAI Chat Completions API
  • Swap models without touching client code

Coming soon

Inference Gateway is not available yet. Leave your email and we'll notify you when it launches.