Pricing
Pay Only for What You Use
No reserved instances. No idle charges. Scale to zero and pay nothing when your models aren't running.
Model
Pay Per Compute Cycle
- Scale to zero — Pay nothing when idle
- Per-second GPU billing — Granular, fair pricing
- No minimum commitments — Start and stop anytime
- No egress fees — Move data freely
Comparison
How Cumulus Compares
| Feature | Cumulus | Modal | AWS SageMaker | RunPod |
|---|---|---|---|---|
| Cold Start Time | 12.5s | 60s | Minutes | Seconds* |
| Scale to Zero | Yes | Yes | No (with cost) | No |
| Per-Second Billing | Yes | Yes | Yes | Yes |
| Serverless | Yes | Yes | Partial | No (reserved) |
| Setup Required | None | Minimal | Significant | Minimal |
| Minimum Commitment | None | None | Varies | Hourly |
Cold Start Time
Cumulus12.5s
Modal60s
SageMakerMinutes
RunPodSeconds*
Scale to Zero
CumulusYes
ModalYes
SageMakerNo (with cost)
RunPodNo
Per-Second Billing
CumulusYes
ModalYes
SageMakerYes
RunPodYes
Serverless
CumulusYes
ModalYes
SageMakerPartial
RunPodNo (reserved)
Setup Required
CumulusNone
ModalMinimal
SageMakerSignificant
RunPodMinimal
Minimum Commitment
CumulusNone
ModalNone
SageMakerVaries
RunPodHourly
FAQ
Frequently Asked Questions
How does pay-per-compute pricing work?+
You're billed for the actual GPU seconds your model uses during inference. When no requests are being processed, your deployment scales to zero — meaning zero cost. There are no charges for idle time, reserved capacity, or standby instances.
Are there any hidden fees?+
No. Cumulus pricing is transparent. You pay for GPU compute time only. There are no egress fees, storage surcharges, or platform fees.
Can I set spending limits?+
Yes. Cumulus supports configurable spending limits and alerts so you can control costs and avoid surprises.
What GPUs are available?+
Cumulus offers NVIDIA A100, H100, and other datacenter-grade GPUs. GPU availability scales dynamically based on demand.