Inference Gateway

Coming soon

One API for many models. Built for production.

What it does

Inference Gateway sits in front of your Inference Endpoints and adds request routing across models, per-key rate limiting, usage logs, and a uniform API across different model runtimes. Swap LLaMA-3.1 for Mistral without touching client code.

Built for teams that have moved past 'one model, one endpoint' and want production-grade observability on their AI traffic. Compatible with the OpenAI Chat Completions API. Logs every request to Object Storage.

Why Chess Cloud

Route requests across multiple models from one API
Per-key rate limiting and usage logging
Compatible with OpenAI Chat Completions API
Swap models without touching client code

Coming soon

Inference Gateway is not available yet. Leave your email and we'll notify you when it launches.

Related services

AI Inference Endpoints

Wrap any model in a container and get a stable HTTPS URL with request-level metering and automatic scaling.

Fine-tuning Jobs

Fine-tune open-weight models with your dataset. Supports LoRA, full fine-tuning, and instruction tuning.