AI Inference Endpoints

Coming soon

Deploy a model. Get an HTTP endpoint. Pay per request.

What it does

Wrap any model in an inference container and deploy it as an Inference Endpoint. We give you a stable HTTPS URL, request-level metering, and automatic scaling based on traffic. Use it from anywhere over a normal HTTP API.

Behind the endpoint, your model runs on GPU Compute. That means you get cheap inference prices without the hyperscaler markup. Deploy LLaMA, Mistral, Stable Diffusion, Whisper, or your own fine-tuned model with the same flow.

Why Chess Cloud

Stable HTTPS URL with request-level metering
Automatic scaling based on traffic
Runs on GPU Compute for cheap inference prices
Deploy LLaMA, Mistral, Stable Diffusion, Whisper, or your own model

Coming soon

AI Inference Endpoints is not available yet. Leave your email and we'll notify you when it launches.

Related services

Fine-tuning Jobs

Fine-tune open-weight models with your dataset. Supports LoRA, full fine-tuning, and instruction tuning.

Inference Gateway

Request routing, per-key rate limiting, usage logs, and a uniform API across different model runtimes.