Pricing

Pay Only for What You Use

No reserved instances. No idle charges. Scale to zero and pay nothing when your models aren't running.

Model

Pay Per Compute Cycle

  • Scale to zero Pay nothing when idle
  • Per-second GPU billing Granular, fair pricing
  • No minimum commitments Start and stop anytime
  • No egress fees Move data freely
Request Access
Comparison

How Cumulus Compares

Cold Start Time

Cumulus12.5s
Modal60s
SageMakerMinutes
RunPodSeconds*

Scale to Zero

CumulusYes
ModalYes
SageMakerNo (with cost)
RunPodNo

Per-Second Billing

CumulusYes
ModalYes
SageMakerYes
RunPodYes

Serverless

CumulusYes
ModalYes
SageMakerPartial
RunPodNo (reserved)

Setup Required

CumulusNone
ModalMinimal
SageMakerSignificant
RunPodMinimal

Minimum Commitment

CumulusNone
ModalNone
SageMakerVaries
RunPodHourly
FAQ

Frequently Asked Questions

How does pay-per-compute pricing work?+
You're billed for the actual GPU seconds your model uses during inference. When no requests are being processed, your deployment scales to zero — meaning zero cost. There are no charges for idle time, reserved capacity, or standby instances.
Are there any hidden fees?+
No. Cumulus pricing is transparent. You pay for GPU compute time only. There are no egress fees, storage surcharges, or platform fees.
Can I set spending limits?+
Yes. Cumulus supports configurable spending limits and alerts so you can control costs and avoid surprises.
What GPUs are available?+
Cumulus offers NVIDIA A100, H100, and other datacenter-grade GPUs. GPU availability scales dynamically based on demand.

Ready to get started?