Cerebrium blog | ML apps at scale: ASGI support now available on Cerebrium

Tutorial

Oct 28, 2024

ML apps at scale: ASGI support now available on Cerebrium

Kyle Gani

Senior Technical Product Manager

Looking to deploy machine learning models in production? Struggling with real-time inference or model serving at scale? Trying to run Gradio and Streamlit apps internally buts its cumbersome? Cerebrium now supports ASGI (Asynchronous Server Gateway Interface) applications, solving common MLOps challenges around model deployment, scalability, and real-time processing.

Let's explore what ASGI support on the Cerebrium platform unlocks for you (Including examples), as well as how to build and deploy an ASGI application in under 5 minutes.

What ASGI support enables you to do:

ASGI is the backbone of modern Python web applications, enabling your applications to handle multiple concurrent connections efficiently. With Cerebrium's ASGI support, you now have complete control over how your ML applications handle requests, process data, and make inference. This means you can build everything from real-time streaming applications to complex ML pipelines, all while maintaining cost efficiency and performance.

Example applications you can try now

Implement WebSocket streaming for real-time voice applications. Check out our updated Twilio voice agent example.
Build intuitive dashboards and web interfaces for your applications. Take a look at our Gradio example.
Batch processing your requests for cost and performance efficiency (Example coming soon)

The best part? Since all these applications run in the same cluster on Cerebrium, they communicate with ultra-low latency within the cluster. This means your monitoring dashboard gets instant updates from your model application, your batch processing system can efficiently manage GPU resources, and your real-time applications can maintain consistent performance.

Want to see how this works in practice? Let's build an ASGI FastAPI application. Check out the complete code, here.

Deploy Your ASGI app to production

Here's how to deploy a FastAPI ASGI application, quickly and easily using the Cerebrium platform. Add the following to your main.py file:

from fastapi import FastAPI, Body
from loguru import logger
from pydantic import BaseModel

app = FastAPI()


class Item(BaseModel):
    # Add your input parameters here
    prompt: str


@app.post("/predict")
def predict(item: Item = Body(...)):
    # Access the parameters from your inference request
    prompt = item.prompt
    logger.info(f"Received a prompt of: `{prompt}`")

    return {
        "your_prompt": prompt,
        "your_other_return": "success",
    }  # return your results


#  health check endpoint
@app.get("/health")
def health():
    return {"status": "healthy"}

Next, update your cerebrium.toml file to include the following configurations:

[cerebrium.deployment]
name = "30-asgi-fast-api-server"
python_version = "3.12"
disable_auth = true
include = ["*"]
exclude = [".*"]
shell_commands = []

[cerebrium.runtime.custom]
port = 5000
entrypoint = ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "5000"]
healthcheck_endpoint = "/health"

[cerebrium.hardware]
cpu = 2
memory = 4.0
compute = "CPU"

[cerebrium.scaling]
min_replicas = 0
max_replicas = 2
cooldown = 10
replica_concurrency = 1000

[cerebrium.dependencies.pip]
pydantic = "latest"
loguru = "latest"
fastapi = "latest"
uvicorn = "latest"

Deploy to production with a single command:

cerebrium deploy -y

Lastly, calling your application is as simple as running this command in your terminal (Notice the placeholders, which you’ll have to replace with your own application specifics):

curl --location 'https://api.cortex.cerebrium.ai/v4/<your-project-id>/30-asgi-fastapi-server/predict' \
--header 'Authorization: Bearer <your-rest-api-key>' \
--header 'Content-Type: application/json' \
--data '{
    "prompt": "your value here"
}'

Need support?

Building an ML startup? We know costs can be challenging when you're just getting started. Reach out to support@cerebrium.ai for additional credits and deployment support.

Want to learn more about deploying ML models in production? Check out our guides on:

Need more?

Explore our examples repository
Start with $30 in free credits (No credit card required for signup). Sign up, here.
Join our Discord community for deployment support.

Don't forget to star our example repository and share your ML deployment success stories. Our team is constantly adding new examples based on real-world deployment scenarios.

Deploying a global scale, AI voice agent with 500ms latency.

Jun 25, 2025

Tutorial

Deploying a global scale, AI voice agent with 500ms latency.

Jun 25, 2025

Tutorial

Deploying Ultravox on Cerebrium for Ultra-low Latency Voice Applications

Apr 28, 2025

Tutorial

Deploying Ultravox on Cerebrium for Ultra-low Latency Voice Applications

Apr 28, 2025

Tutorial

Building a Real-time Coding Assistant

Feb 20, 2025

Tutorial

Building a Real-time Coding Assistant

Feb 20, 2025

Product

Pricing

Developers

Docs

Status

Company

Blog

Use cases

Large language models

Voice

Image & Video

Resources

Examples

Articles

Brand assets

ML apps at scale: ASGI support now available on Cerebrium

What ASGI support enables you to do:

Example applications you can try now

Deploy Your ASGI app to production

Need support?

Need more?

MORE ARTICLES LIKE THIS

Deploying a global scale, AI voice agent with 500ms latency.

Deploying a global scale, AI voice agent with 500ms latency.

Deploying Ultravox on Cerebrium for Ultra-low Latency Voice Applications

Deploying Ultravox on Cerebrium for Ultra-low Latency Voice Applications

Building a Real-time Coding Assistant

Building a Real-time Coding Assistant

Product

Developers

Company

Use cases

Resources