About

The inference platform
for production AI.

Cumulus Labs consolidates the eight things every production AI team builds for themselves — gateway, router, cache, observability, evaluation, fine-tuning, custom hosting, and the inference engine — into a single platform with a single API.

Mission

Make production inference
a configuration choice.

AI teams should not have to assemble seven vendors plus a cloud GPU rental to ship an inference product. Cumulus exists so that routing, caching, observability, evaluation, fine-tuning, and the runtime itself can be one platform behind one API — and so the engineers who would have built that stack can build the product instead.

The Platform

Eight subsystems.
One platform.

Gateway

OpenAI-compatible HTTP layer. One client, every provider.

Router

Declared per-workflow routing. Deterministic. Traceable.

Cache

Exact-match, prefix, and semantic caches. Stacked.

Observability

Every request logged. Replayable audit log.

Evaluation

Synthetic data, heuristics, judges, shadow evaluation.

Fine-tune

One-click LoRA training on captured traffic.

Custom hosting

Bring open weights or fine-tunes. Served on Ion.

Ion

Custom attention kernels on NVIDIA Grace and Blackwell.

Backers

Backed by

Y Combinator (W26)

Selected for the Winter 2026 batch — building alongside the world-class founders defining the next decade of AI infrastructure.

NVIDIA Inception

Member of NVIDIA's program for the highest-leverage startups working at the frontier of AI and accelerated compute.

Team

Founded by alumni from

Engineers and operators out of defense, finance, and the institutions that built modern GPU compute.

Palantir

NASA

Space Force

Blackstone

Georgia Tech

UW Madison

Values

What we believe

Speed is a Feature

Every millisecond of inference latency multiplies across every user, every request. We optimize the runtime itself.

Simplicity Over Complexity

The best inference platform is invisible. Change one line. Keep your code. The seven other vendors disappear.

Open by Default

Any model. Any provider. Any framework. The router decides — and you can override it at any time.

One line of code away.

Get started Open roles

The inference platformfor production AI.

Make production inferencea configuration choice.

Eight subsystems.One platform.