Documentation Index
Fetch the complete documentation index at: https://cerebrium.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Graceful Termination
Cerebrium runs in a shared, multi-tenant environment. The platform continuously adjusts capacity — spinning down nodes and launching new ones to scale, optimize compute usage, and roll out updates. Workloads are migrated to new nodes during this process. Applications also have metric-based autoscaling criteria that dictate when instances scale, remain active, or shift during deployments. Implement graceful termination to prevent requests from ending prematurely when instances are marked for termination.Understanding Instance Termination
For both application autoscaling and internal node scaling, the platform sends a SIGTERM signal to warn the application of an impending shutdown. Cortex applications (Cerebrium’s default runtime) handle this automatically. Custom runtimes must catch and handle this signal to shut down gracefully. Onceresponse_grace_period elapses, the platform sends a SIGKILL signal, terminating the instance immediately.
When Cerebrium terminates a container, the following sequence occurs:
- Stop routing new requests to the container.
- Send a SIGTERM signal to the container.
- Wait for
response_grace_periodseconds to elapse. - Send SIGKILL if the container hasn’t stopped.
SIGTERM, which interrupts in-flight requests and causes 502 errors.
Implementation
For custom runtimes using FastAPI, implement thelifespan pattern to respond to SIGTERM.
main.py
Create a new file namedmain.py with the following:
cerebrium.toml
If you already have acerebrium.toml file, add or update these sections. If you don’t have one, create a new file with the following:
replica_concurrencyshould matchmax_concurrencyin yourAppStateclass (if you add that field)portmust match the port in your Dockerfile CMD- Adjust hardware settings based on your application needs
Dockerfile
Create a new file namedDockerfile with the following:
requirements.txt
Add to your existingrequirements.txt or create a new file with:
Key Points
- The
/readyendpoint is essential for proper load balancing during scaling events - Without a proper
/readyendpoint, Cerebrium uses TCP ping which only checks if the port is open, potentially routing traffic to replicas that are shutting down - All request tracking uses asyncio locks to ensure thread safety