- [cerebrium.deployment] Core settings like app name, Python version, and file inclusion rules
- [cerebrium.runtime.custom] Custom web server settings and app startup behavior
- [cerebrium.hardware] Compute resources including CPU, memory, and GPU specifications
- [cerebrium.scaling] Auto-scaling behavior and replica management
- [cerebrium.dependencies] Package management for Python (pip), system (apt), and Conda dependencies
Deployment Configuration
The[cerebrium.deployment] section defines core deployment settings.
| Option | Type | Default | Description |
|---|---|---|---|
| name | string | required | Desired app name |
| python_version | string | ”3.12” | Python version to use (3.10, 3.11, 3.12) |
| disable_auth | boolean | false | Disable default token-based authentication on app endpoints |
| include | string[] | [”*“] | Files/patterns to include in deployment |
| exclude | string[] | [”.*“] | Files/patterns to exclude from deployment |
| shell_commands | string[] | [] | Commands to run at the end of the build |
| pre_build_commands | string[] | [] | Commands to run before dependencies install |
| docker_base_image_url | string | ”debian:bookworm-slim” | Base Docker image |
| use_uv | boolean | false | Use UV for faster Python package installation |
| deployment_initialization_timeout | integer | 600 (10 minutes) | The max time to wait for app initialisation during build before timing out. Value must be between 60 and 830 |
Changes to python_version or docker_base_image_url trigger full rebuilds since
they affect the base environment.
UV Package Manager
UV is a fast Python package installer written in Rust that significantly speeds up deployment times. When enabled, UV replaces pip for installing Python dependencies.UV typically installs packages 10-100x faster than pip, especially beneficial for:
- Large dependency trees
- Multiple packages
- Clean builds without cache
Monitoring UV Usage
Check your build logs for these indicators:- UV_PIP_INSTALL_STARTED - UV is successfully being used
- PIP_INSTALL_STARTED - Standard pip installation (when
use_uvisfalse)
Deploying with UV Lock Files
read only if you’re using
pyproject.toml and uv.lock- Ensure requirements.txt is in your project directory
- Deploy with UV enabled
Runtime Configuration
The[cerebrium.runtime.custom] section configures custom web servers and runtime behavior.
| Option | Type | Default | Description |
|---|---|---|---|
| port | integer | required | Port the application listens on |
| entrypoint | string[] | required | Command to start the application |
| healthcheck_endpoint | string | "" | HTTP path for health checks (empty uses TCP). Failure causes the instance to restart |
| readycheck_endpoint | string | "" | HTTP path for readiness checks (empty uses TCP). Failure ensures the load balancer does not route to the instance |
The port specified in entrypoint must match the port parameter. All endpoints
will be available at
https://api.aws.us-east-1.cerebrium.ai/v4/{project - id} /{app - name}/your/endpointHardware Configuration
The[cerebrium.hardware] section defines compute resources.
| Option | Type | Default | Description |
|---|---|---|---|
| cpu | float | required | Number of CPU cores |
| memory | float | required | Memory allocation in GB |
| compute | string | ”CPU” | Compute type (CPU, AMPERE_A10, etc.) |
| gpu_count | integer | 0 | Number of GPUs |
| provider | string | ”aws” | Cloud provider |
| region | string | ”us-east-1” | Deployment region |
Scaling Configuration
The[cerebrium.scaling] section controls auto-scaling behavior.
| Option | Type | Default | CLI Requirement | Description |
|---|---|---|---|---|
| min_replicas | integer | 0 | 2.1.2+ | Minimum running instances |
| max_replicas | integer | 2 | 2.1.2+ | Maximum running instances |
| replica_concurrency | integer | 10 | 2.1.2+ | Concurrent requests per replica |
| response_grace_period | integer | 3600 | 2.1.2+ | Grace period in seconds |
| cooldown | integer | 1800 | 2.1.2+ | Time window (seconds) that must pass at reduced concurrency before scaling down. Helps avoid cold starts from brief traffic dips. |
| scaling_metric | string | ”concurrency_utilization” | 2.1.2+ | Metric for scaling decisions (concurrency_utilization, requests_per_second, cpu_utilization, memory_utilization) |
| scaling_target | integer | 100 | 2.1.2+ | Target value for scaling metric (percentage for utilization metrics, absolute value for requests_per_second) |
| scaling_buffer | integer | optional | 2.1.2+ | Additional replica capacity above what scaling metric suggests |
| evaluation_interval_seconds | integer | 30 | 2.1.5+ | Time window in seconds over which metrics are evaluated before scaling decisions (6-300s) |
| load_balancing_algorithm | string | "" | 2.1.5+ | Algorithm for distributing traffic across replicas. Default: round-robin if replica_concurrency > 3, first-available otherwise. Options: round-robin, first-available, min-connections, random-choice-2 |
| compute_tier | string | ”interruptible” | 2.1.6+ | Controls pod scheduling on spot vs on-demand instances. Options: interruptible (spot, lower cost), protected (on-demand, higher availability) |
| roll_out_duration_seconds | integer | 0 | 2.1.2+ | Gradually send traffic to new revision after successful build. Max 600s. Keep at 0 during development. |
scaling_metric options are:
- concurrency_utilization: Maintains a percentage of your replica_concurrency across instances. For example, with
replica_concurrency=200andscaling_target=80, maintains 160 requests per instance. - requests_per_second: Maintains a specific request rate across all instances. For example,
scaling_target=5maintains 5 requests/s average across instances. - cpu_utilization: Maintains CPU usage as a percentage of cerebrium.hardware.cpu. For example, with
cpu=2andscaling_target=80, maintains 80% CPU utilization (1.6 CPUs) per instance. - memory_utilization: Maintains RAM usage as a percentage of cerebrium.hardware.memory. For example, with
memory=10andscaling_target=80, maintains 80% memory utilization (8GB) per instance.
The scaling_buffer option is only available with concurrency_utilization and requests_per_second metrics.
It ensures extra capacity is maintained above what the scaling metric suggests.For example, with
min_replicas=0 and scaling_buffer=3, the system will maintain 3 replicas as baseline capacity.Dependencies
Pip Dependencies
The[cerebrium.dependencies.pip] section lists Python package requirements.
APT Dependencies
The[cerebrium.dependencies.apt] section specifies system packages.
Conda Dependencies
The[cerebrium.dependencies.conda] section manages Conda packages.
Dependency Files
The[cerebrium.dependencies.paths] section allows using requirement files.