Cumulus

Cumulus — The Fastest Serverless GPU Cloud for AI Inference

Deploy Any Model. The Fastest GPU Cold Starts.

Serverless GPU cloud with 12.5s cold starts. Cloud GPU inference that scales to zero — pay only for compute.

Founded by alumni from

Georgia Tech
NASA
Palantir
Space Force
Blackstone
UW Madison
Georgia Tech
NASA
Palantir
Space Force
Blackstone
UW Madison
Georgia Tech
NASA
Palantir
Space Force
Blackstone
UW Madison
Georgia Tech
NASA
Palantir
Space Force
Blackstone
UW Madison
Performance

GPU Cold Start to First Inference

Serverless GPU Inference — Flux 2 Diffusion Model

Cold Start
Model Load
0.0x Faster
0s20s40s60s
Cumulus
16.7s
12.5s
4.2s
Modal
70s
60s
10s

* Based on internal testing with memory snapshots and torch.compile() enabled on Modal.

Process

How Serverless GPU Hosting Works

01

Deploy to GPU Cloud Instantly

Deploy your model to serverless GPUs with one function call. Get back a model_id and inference endpoint ready to use.

deploy.py
from cumulus import deploy

# Deploy your model
model = deploy("./flux-2-schnell")

# Returns model_id and endpoint
print(model.id)       # "flux-2-abc123"
print(model.endpoint)  # "api.cumulus.io/flux-2-abc123"
02

Run GPU Inference

Use the model_id to run GPU inference, or hit the endpoint directly from any language.

inference.py
from cumulus import run

# Call using model_id
result = run("flux-2-abc123", {
    "prompt": "a watch on marble"
})

# Or use the endpoint directly
requests.post("api.cumulus.io/flux-2-abc123", ...)
03

Never Think About GPU Infra

We handle GPU selection, replicas, autoscaling, and failover. Fast GPU hosting without the ops.

Auto-Scaling
any cloud, any region
1replica
10replicas
100+replicas

Scale from 1 to 100+ replicas instantly

You write code. We handle the rest.

04

Pay Per GPU Compute Cycle

Billed by actual GPU compute used, not idle time. Scale to zero, pay nothing when idle. The cheapest way to run GPU inference.

Pay-Per-Compute
1 hour sample
0m15m30m45m60m
45%billed55% idle— free
$0.60/hr
$0.27(55% saved)

Cumulus OS

On-premises GPU hosting for your own private cloud.

GPU cluster management powered by Ion.