Pricing

Pay for what you use

Cerebrium charges based on actual compute time, measured in seconds. Resources scale dynamically with demand, allowing you to run everything from low-traffic workloads to high-throughput systems while maintaining precise, usage-based pricing at scale.

Get started

Compute costs

Compute

Price

B200 $0.00167 /s
H200 $0.001166 /s
H100 $0.000944 /s
RTX PRO 6000 $0.000694 /s
A100 (80GB) $0.000583 /s
A100 (40GB) $0.000555 /s
L40s $0.000542 /s
A10 $0.000306 /s
L4 $0.000222 /s
T4 $0.000164 /s
CPU Only $0.00000655 /vCPU/s

Other

Price

Memory $0.00000222 /GB/s
Storage $0.05 /GB/mo

*First 100GB storage free

Per Second

Per Hour

Hobby

For developers getting started

Free + compute / month

3 user seats
Up to 3 deployed apps
500 containers + 5 Concurrent GPUs
Stack & intercom support
1 day log retention

Start for free

Standard

For developers with ML apps in producation

$100 + compute / month

Everything in Hobby plan
Unlimited seats
Unlimited apps
1000 containers + 30 GPU concurency
Custom domains

Start for free

Enterprise

For teams looking to scale ML apps

Custom

Everything in Standard plan
Volume Discounts
Unlimited Concurrent GPUs
Dedicated Slack support
White glove onboarding
ML engineering services

Detailed Plan Comparison

Hobby

Standard

Enterprise

Price

Free

$100

Custom

Workspace

Projects

Unlimited

Deployed applications

Unlimited

Seats

Custom

Custom Domains

Data & Compliance

Log retention

7 day

30 days

Unlimited

SOC2 compliance

HIPPA, GDPR, ISO 27001

Project Specifics

CPU concurrency

500

1000

Unlimited

GPU concurrency

Unlimited

Real-time Observability (In-app logging & monitoring)

Support

Community support

Private Slack Channel

ML Engineering Service

Start for free

Detailed Plan Comparison

Hobby

Standard

Enterprise

Transparent 
Pricing

Calculate costs based on your exact workload

Estimated monthly cost

$0.0066

DetailsEstimated costs

GPU cost$0.000306/s

CPU cost$0.000007/s

Memory cost$0.000018/s

Total cost$0.000330/s

Number of requests

*Average per month

Average runtime

seconds

Hardware

GPUs

24 GB VRAM per GPU

vCPUs

Only pay for what you use

Memory

Requirement in GB

8 GB

Pricing FAQs

How does pricing compare to on-demand/spot on AWS?

Cerebrium pricing shouldn’t be directly compared to raw CPU or GPU instance prices on AWS, GCP, or NeoClouds. Traditional cloud compute often includes minutes of provisioning, warm-up, and idle time that you still pay for, along with the added cost of overprovisioning for peak demand. Cerebrium, by contrast, scales containers up and down in 1–3 seconds, and its memory and GPU snapshotting can restore workloads even faster, reducing billable startup overhead.

Our pricing also includes orchestration, networking, and the serverless platform required to run AI workloads in production, so comparing GPU cost alone misses the bigger picture. In addition, Cerebrium integrates across multiple cloud providers globally, allowing us to route workloads to the most cost-efficient infrastructure available through a single integration. For bursty or unpredictable workloads, this often makes Cerebrium more cost-effective overall, since you pay for less idle time and avoid the burden of managing infrastructure yourself.

Real world billing example

If you run a transcription workload on an L4 GPU with 2 vCPUs and 10GB memory (costing you $0.000257 per second) and each request runs for 2.4 seconds, then 500,000 requests in a month would cost roughly $309.

Because Cerebrium bills based on the resources you allocate and how long they actively run, you only pay for compute while your workload is processing requests.

If you want a more comprehensive breakdown, you can take a look in our docs here.

Can I use my AWS, GCP credits?

No. Cerebrium pricing is separate from AWS and GCP, so their cloud credits can’t be applied to usage on our platform.

Is there a discount for larger deployments or long-term contracts?

Yes. We offer discounts for larger deployments and longer-term commitments.

Pricing depends on several factors, including your expected spend, the number of consecutive months you plan to maintain that spend, and the specific GPU or compute SKUs you need. Discounts can also vary based on current infrastructure availability.

For larger or longer-term workloads, reach out to our team and we can put together pricing tailored to your deployment.

Do you offer guaranteed capacity without traditional reservations?

Yes. For bursty workloads, Cerebrium can guarantee access to capacity without requiring you to reserve and pay for infrastructure 24/7. Instead, you pay for the compute you use, with a minimum monthly spend commitment.

For example, we may guarantee access to up to 50 H100s at any point in time, for however long you need them, with a $10,000 minimum monthly spend.

Pricing

Compute costs

Compute

Other

Hobby

Standard

Enterprise

Detailed Plan Comparison

Price

Workspace

Projects

Deployed applications

Seats

Custom Domains

Data & Compliance

Log retention

SOC2 compliance

HIPPA, GDPR, ISO 27001

Project Specifics

CPU concurrency

GPU concurrency

Real-time Observability (In-app logging & monitoring)

Support

Community support

Private Slack Channel

ML Engineering Service

Detailed Plan Comparison

Workspace

Data & Compliance

Project Specifics

Support

Workspace

Data & Compliance

Project Specifics

Support

Workspace

Data & Compliance

Project Specifics

Support

Transparent Pricing

Estimated monthly cost

Pricing FAQs

Transparent 
Pricing