Pricing

Pay for what you use

Cerebrium charges based on actual compute time, measured in seconds. Resources scale dynamically with demand, allowing you to run everything from low-traffic workloads to high-throughput systems while maintaining precise, usage-based pricing at scale.

Compute costs

Compute

Price

  • B200 $0.00167 /s
  • H200 $0.001166 /s
  • H100 $0.000944 /s
  • A100 (80GB) $0.000583 /s
  • A100 (40GB) $0.000555 /s
  • L40s $0.000542 /s
  • A10 $0.000306 /s
  • L4 $0.000222 /s
  • T4 $0.000164 /s
  • CPU Only $0.00000655 /vCPU/s


Other

Price

  • Memory $0.00000222 /GB/s
  • Storage $0.05 /GB/mo

*First 100GB storage free

Per Second Per Hour

Hobby

For developers getting started

Free + compute / month

  • 3 user seats
  • Up to 3 deployed apps
  • 500 containers + 5 Concurrent GPUs
  • Stack & intercom support
  • 1 day log retention

Standard

For developers with ML apps in producation

$100 + compute / month

  • Everything in Hobby plan
  • Unlimited seats
  • Unlimited apps
  • 1000 containers + 30 GPU concurency
  • Custom domains
Most Popular

Enterprise

For teams looking to scale ML apps

Custom

  • Everything in Standard plan
  • Volume Discounts
  • Unlimited Concurrent GPUs
  • Dedicated Slack support
  • White glove onboarding
  • ML engineering services

Detailed Plan Comparison

Hobby Standard Enterprise

Transparent

Pricing

Calculate costs based on your exact workload

Estimated monthly cost

$0.0066

DetailsEstimated costs

GPU cost$0.000306/s
CPU cost$0.000007/s
Memory cost$0.000018/s
Total cost$0.000330/s

*Average per month

seconds

24 GB VRAM per GPU

1

Only pay for what you use

1

Requirement in GB

8 GB

Pricing FAQs

How does pricing compare to on-demand/spot on AWS?

Cerebrium pricing shouldn’t be directly compared to raw CPU or GPU instance prices on AWS, GCP, or NeoClouds. Traditional cloud compute often includes minutes of provisioning, warm-up, and idle time that you still pay for, along with the added cost of overprovisioning for peak demand. Cerebrium, by contrast, scales containers up and down in 1–3 seconds, and its memory and GPU snapshotting can restore workloads even faster, reducing billable startup overhead.

Our pricing also includes orchestration, networking, and the serverless platform required to run AI workloads in production, so comparing GPU cost alone misses the bigger picture. In addition, Cerebrium integrates across multiple cloud providers globally, allowing us to route workloads to the most cost-efficient infrastructure available through a single integration. For bursty or unpredictable workloads, this often makes Cerebrium more cost-effective overall, since you pay for less idle time and avoid the burden of managing infrastructure yourself.

Real world billing example

If you run a transcription workload on an L4 GPU with 2 vCPUs and 10GB memory (costing you $0.000257 per second) and each request runs for 2.4 seconds, then 500,000 requests in a month would cost roughly $309.

Because Cerebrium bills based on the resources you allocate and how long they actively run, you only pay for compute while your workload is processing requests.

If you want a more comprehensive breakdown, you can take a look in our docs here.

Can I use my AWS, GCP credits?

No. Cerebrium pricing is separate from AWS and GCP, so their cloud credits can’t be applied to usage on our platform.

Is there a discount for larger deployments or long-term contracts?

Yes. We offer discounts for larger deployments and longer-term commitments.

Pricing depends on several factors, including your expected spend, the number of consecutive months you plan to maintain that spend, and the specific GPU or compute SKUs you need. Discounts can also vary based on current infrastructure availability.

For larger or longer-term workloads, reach out to our team and we can put together pricing tailored to your deployment.

Do you offer guaranteed capacity without traditional reservations?

Yes. For bursty workloads, Cerebrium can guarantee access to capacity without requiring you to reserve and pay for infrastructure 24/7. Instead, you pay for the compute you use, with a minimum monthly spend commitment.

For example, we may guarantee access to up to 50 H100s at any point in time, for however long you need them, with a $10,000 minimum monthly spend.