Pricing
Pay for what you use
Cerebrium charges based on actual compute time, measured in seconds. Resources scale dynamically with demand, allowing you to run everything from low-traffic workloads to high-throughput systems while maintaining precise, usage-based pricing at scale.
Compute costs
Compute
Price
- B200 $0.00167 /s
- H200 $0.001166 /s
- H100 $0.000944 /s
- A100 (80GB) $0.000583 /s
- A100 (40GB) $0.000555 /s
- L40s $0.000542 /s
- A10 $0.000306 /s
- L4 $0.000222 /s
- T4 $0.000164 /s
- CPU Only $0.00000655 /vCPU/s
Other
Price
- Memory $0.00000222 /GB/s
- Storage $0.05 /GB/mo
*First 100GB storage free
Hobby
For developers getting started
Free + compute / month
- 3 user seats
- Up to 3 deployed apps
- 500 containers + 5 Concurrent GPUs
- Stack & intercom support
- 1 day log retention
Standard
For developers with ML apps in producation
$100 + compute / month
- Everything in Hobby plan
- Unlimited seats
- Unlimited apps
- 1000 containers + 30 GPU concurency
- Custom domains
Enterprise
For teams looking to scale ML apps
Custom
- Everything in Standard plan
- Volume Discounts
- Unlimited Concurrent GPUs
- Dedicated Slack support
- White glove onboarding
- ML engineering services
Detailed Plan Comparison
Price
Free
Workspace
- Projects Unlimited
- Deployed applications 3
- Seats 3
- Custom Domains
Data & Compliance
- Log retention 7 day
- SOC2 compliance
- HIPPA, GDPR, ISO 27001
Project Specifics
- CPU concurrency 500
- GPU concurrency 5
- Real-time Observability (In-app logging & monitoring)
Support
- Community support
- Private Slack Channel
- ML Engineering Service
Price
$100
Workspace
- Projects Unlimited
- Deployed applications Unlimited
- Seats 10
- Custom Domains
Data & Compliance
- Log retention 30 days
- SOC2 compliance
- HIPPA, GDPR, ISO 27001
Project Specifics
- CPU concurrency 1000
- GPU concurrency 30
- Real-time Observability (In-app logging & monitoring)
Support
- Community support
- Private Slack Channel
- ML Engineering Service
Price
Custom
Workspace
- Projects Unlimited
- Deployed applications Unlimited
- Seats Custom
- Custom Domains
Data & Compliance
- Log retention Unlimited
- SOC2 compliance
- HIPPA, GDPR, ISO 27001
Project Specifics
- CPU concurrency Unlimited
- GPU concurrency Unlimited
- Real-time Observability (In-app logging & monitoring)
Support
- Community support
- Private Slack Channel
- ML Engineering Service
Transparent
Pricing
Calculate costs based on your exact workload
Estimated monthly cost
$0.0066
*Average per month
seconds
24 GB VRAM per GPU
Only pay for what you use
Requirement in GB
Pricing FAQs
How does pricing compare to on-demand/spot on AWS?
Cerebrium pricing shouldn’t be directly compared to raw CPU or GPU instance prices on AWS, GCP, or NeoClouds. Traditional cloud compute often includes minutes of provisioning, warm-up, and idle time that you still pay for, along with the added cost of overprovisioning for peak demand. Cerebrium, by contrast, scales containers up and down in 1–3 seconds, and its memory and GPU snapshotting can restore workloads even faster, reducing billable startup overhead.
Our pricing also includes orchestration, networking, and the serverless platform required to run AI workloads in production, so comparing GPU cost alone misses the bigger picture. In addition, Cerebrium integrates across multiple cloud providers globally, allowing us to route workloads to the most cost-efficient infrastructure available through a single integration. For bursty or unpredictable workloads, this often makes Cerebrium more cost-effective overall, since you pay for less idle time and avoid the burden of managing infrastructure yourself.
Real world billing example
If you run a transcription workload on an L4 GPU with 2 vCPUs and 10GB memory (costing you $0.000257 per second) and each request runs for 2.4 seconds, then 500,000 requests in a month would cost roughly $309.
Because Cerebrium bills based on the resources you allocate and how long they actively run, you only pay for compute while your workload is processing requests.
If you want a more comprehensive breakdown, you can take a look in our docs here.
Can I use my AWS, GCP credits?
No. Cerebrium pricing is separate from AWS and GCP, so their cloud credits can’t be applied to usage on our platform.
Is there a discount for larger deployments or long-term contracts?
Yes. We offer discounts for larger deployments and longer-term commitments.
Pricing depends on several factors, including your expected spend, the number of consecutive months you plan to maintain that spend, and the specific GPU or compute SKUs you need. Discounts can also vary based on current infrastructure availability.
For larger or longer-term workloads, reach out to our team and we can put together pricing tailored to your deployment.
Do you offer guaranteed capacity without traditional reservations?
Yes. For bursty workloads, Cerebrium can guarantee access to capacity without requiring you to reserve and pay for infrastructure 24/7. Instead, you pay for the compute you use, with a minimum monthly spend commitment.
For example, we may guarantee access to up to 50 H100s at any point in time, for however long you need them, with a $10,000 minimum monthly spend.