Basic Setup
Developing models with Cerebrium is similar to developing on a virtual machine or Google Colab. Install the Cerebrium package and log in. See the installation docs for details. Create the project:[cerebrium.dependencies.pip] section of your cerebrium.toml file:
util.py file for utility functions — downloading a file from a URL or converting a base64 string to a file:
main.py with the main application code. The endpoint accepts either a base64-encoded string or a public URL of the audio file, passes it to the model, and returns the output. Define the request object:
audio and file_url are optional parameters, at least one must be provided. The webhook_endpoint parameter, automatically included by Cerebrium in every request, is useful for long-running requests.
Note: Cerebrium has a 3-minute timeout for each inference request. For long audio files (2+ hours) that take several minutes to process, use a webhook_endpoint — a URL where Cerebrium sends a POST request with the function’s results.
Setup Model and inference
Import the required packages and load the Whisper model. The model downloads during initial deployment and is automatically cached in persistent storage for subsequent use. Loading the model outside thepredict function ensures this code only runs on cold start (startup). For warm containers, only the predict function executes for inference.
predict function, which runs only on inference requests, creates an audio file from either the download URL or base64 string, transcribes it, and returns the output.
Deploy
Configure your compute and environment settings incerebrium.toml:
run_id — a unique identifier to correlate the result with the initial workload.
The endpoint returns results in this format: