Skip to main content

Documentation Index

Fetch the complete documentation index at: https://cerebrium.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

GPUs accelerate computational workloads through parallel processing. Originally designed for graphics rendering, modern GPUs are essential for AI models, large-scale data processing, and other compute-intensive applications. Cerebrium provides GPU access through configuration in the cerebrium.toml file, without requiring infrastructure management.

Specifying GPUs

Configure GPUs in the [cerebrium.hardware] section of cerebrium.toml, specifying the type (compute parameter) and quantity (gpu_count). Additional deployment and scaling considerations are covered in the sections below.

Available GPUs

The platform offers GPUs ranging from cost-effective development options to high-end enterprise hardware.
GPU ModelIdentifierVRAM (GB)Max GPUsPlan required
NVIDIA B300BLACKWELL_B3002628Enterprise
NVIDIA B200BLACKWELL_B2001808Enterprise
NVIDIA H200HOPPER_H2001418Enterprise
NVIDIA H100HOPPER_H100808Enterprise
NVIDIA A100AMPERE_A100_80GB808Standard
NVIDIA A100AMPERE_A100_40GB408Standard
NVIDIA L40sADA_L40488Hobby+
NVIDIA L4ADA_L4248Hobby+
NVIDIA A10AMPERE_A10248Hobby+
NVIDIA T4TURING_T4168Hobby+
AWS TrainiumTRN1328Hobby+
The identifier is used in the cerebrium.toml file. It consists of the GPU model generation and model name to avoid ambiguity.
GPU selection is also possible using the --compute and --gpu-count flags during application initialization.

Multi-GPU Configuration

Multiple GPUs are configured in the cerebrium.toml file:
[cerebrium.hardware]
compute = "AMPERE_A100_80GB"
gpu_count = 4        # Number of GPUs needed
cpu = 8
memory = 128.0
GPU availability varies by region and provider. Narrowing the provider and region constraints increases the likelihood of request queuing. For guaranteed burst capacity, contact the enterprise plan team.