Skip to main content

Documentation Index

Fetch the complete documentation index at: https://cerebrium.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Cerebrium’s default runtime covers most app needs. For more control, use ASGI or WSGI servers through the custom runtime feature - enabling custom authentication, dynamic batching, frontend dashboards, public endpoints, and WebSocket connections.

Setting Up Custom Servers

A basic FastAPI server running as a custom server on Cerebrium:
from fastapi import FastAPI
app = FastAPI()

@app.post("/hello")
def hello():
    return {"message": "Hello Cerebrium!"}

@app.get("/health")
def health():
    return "OK"

@app.get("/ready")
def ready():
    return "OK"
Configure this server in cerebrium.toml by adding a custom runtime section:
[cerebrium.runtime.custom]
port = 5000
entrypoint = ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "5000"]
healthcheck_endpoint = "/health"
readycheck_endpoint = "/ready"

[cerebrium.dependencies.pip]
pydantic = "latest"
numpy = "latest"
loguru = "latest"
fastapi = "latest"
The configuration requires three key parameters:
  • entrypoint: The command that starts your server
  • port: The port your server listens on
  • healthcheck_endpoint: The endpoint used to confirm instance health. If unspecified, defaults to a TCP ping on the configured port. If the health check registers a non-200 response, it will be considered unhealthy, and be restarted should it not recover timely.
  • readycheck_endpoint: The endpoint used to confirm if the instance is ready to receive. If unspecified, defaults to a TCP ping on the configured port. If the ready check registers a non-200 response, it will not be a viable target for request routing.
For ASGI applications like FastAPI, include the appropriate server package (like uvicorn) in your dependencies. After deployment, your endpoints become available at https://api.aws.us-east-1.cerebrium.ai/v4/[project-id]/[app-name]/your/endpoint.
The FastAPI Server Example provides a complete implementation.

Request Headers

Custom web servers receive the Cerebrium run ID in the X-Request-Id header on every request. This corresponds to the internal run_id and is useful for tracking and debugging.