Documentation Index
Fetch the complete documentation index at: https://cerebrium.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Mystic AI is sunsetting their services. They were an early pioneer that pushed the industry forward. This guide covers migrating apps from Mystic to Cerebrium to keep them functional.
It covers converting existing Mystic code (using a stable diffusion example) and configuration to the Cerebrium platform, including deployment optimization for performance and cost efficiency.
Key Differences
Cerebrium helps teams deploy and run models efficiently. The infrastructure is designed for reliable performance:
- The average model cold-starts in 2-5 seconds.
- Updates to your code deploy quickly, taking only 8-14 seconds.
- 99.9% uptime.
Cerebrium provides precise control over computing resources. Instead of managing entire instances, select the exact CPU, memory, and GPU power needed. Billing is per-second for actual resource usage. Use the pricing calculator for cost estimates.
Migration Process
1. Project Setup and Configuration
Install Cerebrium’s command-line tool and create the project:
pip install cerebrium --upgrade
cerebrium login # You'll be redirected to the dashboard for login
cerebrium init stable-diffusion
cd stable-diffusion
Convert the existing Mystic configuration to Cerebrium’s format. A typical Mystic configuration:
# Mystic's pipeline.yaml
runtime:
container_commands:
- apt-get update
- apt-get install -y git
python:
version: "3.10"
requirements:
- pipeline-ai
- diffusers==0.24.0
- torch==2.1.1
- transformers==4.35.2
- accelerate==0.25.0
cuda_version: "11.4"
accelerators:
- "nvidia_a10"
accelerator_memory: null
pipeline_graph: sd_pipeline:pipeline_graph
pipeline_name: <YOUR_USERNAME>/stable-diffusion-v1.5
extras: {}
Becomes this Cerebrium TOML config:
# cerebrium.toml
[cerebrium.deployment]
name = "stable-diffusion"
python_version = "3.11"
docker_base_image_url = "debian:bookworm-slim"
include = ["./*", "main.py", "cerebrium.toml"]
exclude = [".*"]
[cerebrium.hardware]
compute = "AMPERE_A10" # Choose your GPU type
cpu = 4 # Number of CPU cores
memory = 16.0 # Memory in GB
gpu_count = 1 # Number of GPUs
[cerebrium.scaling]
min_replicas = 0 # Save costs when inactive and scale down your app
max_replicas = 2 # Handle increased traffic and scale up where necessary
cooldown = 60 # Time window at reduced concurrency before scaling down
replica_concurrency = 1 # The number of requests a single container can support
[cerebrium.dependencies.pip]
torch = ">=2.0.0"
pydantic = "latest"
transformers = "latest"
accelerate = "latest"
diffusers = "latest"
safetensors = "latest"
xformers = "latest"
2. Code Migration
Convert the model implementation. A typical Mystic pipeline:
import typing as t
from pathlib import Path
from PIL.Image import Image
from pipeline.cloud.pipelines import run_pipeline
from pipeline.objects.graph import InputField, InputSchema
from pipeline import File, Pipeline, Variable, entity, pipe
HF_MODEL_ID = "runwayml/stable-diffusion-v1-5"
class ModelKwargs(InputSchema):
num_images_per_prompt: int | None = InputField(
title="num_images_per_prompt",
description="The number of images to generate per prompt.",
default=1,
optional=True,
)
height: int | None = InputField(
title="height",
description="The height in pixels of the generated image.",
default=512,
optional=True,
multiple_of=64,
ge=64,
)
width: int | None = InputField(
title="width",
description="The width in pixels of the generated image.",
default=512,
optional=True,
multiple_of=64,
ge=64,
)
num_inference_steps: int | None = InputField(
title="num_inference_steps",
description=(
"The number of denoising steps. More denoising steps "
"usually lead to a higher quality image at the expense "
"of slower inference."
),
default=50,
optional=True,
)
@entity
class StableDiffusionModel:
def __init__(self) -> None:
self.model = None
self.device = None
@pipe(run_once=True, on_startup=True)
def load(self) -> None:
"""
Load the HF model into memory"""
import torch
from diffusers import StableDiffusionPipeline
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
self.model = StableDiffusionPipeline.from_pretrained(HF_MODEL_ID)
self.model.to(device)
@pipe
def predict(self, prompt: str, model_kwargs: ModelKwargs) -> t.List[Image]:
"""
Generates a list of PIL images.
"""
return self.model(prompt=prompt, **model_kwargs.to_dict()).images
@pipe
def postprocess(self, images: t.List[Image]) -> t.List[File]:
"""
Creates a list of Files from the `PIL` images.
"""
output_images = []
for i, image in enumerate(images):
path = Path(f"/tmp/sd/image-{i}.jpg")
path.parent.mkdir(parents=True, exist_ok=True)
image.save(str(path))
output_images.append(File(path=path, allow_out_of_context_creation=True))
return output_images
with Pipeline() as builder:
prompt = Variable(
str,
title="prompt",
description="The prompt to guide image generation",
max_length=512,
)
model_kwargs = Variable(ModelKwargs)
model = StableDiffusionModel()
model.load()
images: t.List[Image] = model.predict(prompt, model_kwargs)
output: t.List[File] = model.postprocess(images)
builder.output(output)
pipeline_graph = builder.get_pipeline()
The Cerebrium equivalent in main.py:
import base64
import io
import torch
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from pydantic import BaseModel
# Define the structure of input parameters
class Item(BaseModel):
prompt: str
height: int
width: int
num_inference_steps: int
num_images_per_prompt: int
# Load the model and set it up for inference
model_id = "stabilityai/stable-diffusion-2-1"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe = pipe.to("cuda")
# The endpoint we'll call to make inference
def predict(
prompt: str,
height: int = 512,
width: int = 512,
num_inference_steps: int = 25,
num_images_per_prompt: int = 1,
):
item = Item(
prompt=prompt,
height=height,
width=width,
num_inference_steps=num_inference_steps,
num_images_per_prompt=num_images_per_prompt,
)
images = pipe(
prompt=item.prompt,
height=item.height,
width=item.width,
num_images_per_prompt=item.num_images_per_prompt,
num_inference_steps=item.num_inference_steps,
).images
finished_images = []
for image in images:
buffered = io.BytesIO()
image.save(buffered, format="PNG")
finished_images.append(base64.b64encode(buffered.getvalue()).decode("utf-8"))
return finished_images
3. Deployment
Deploy your model with a single command:
4. Inference
Once your app is deployed, you can make requests to your model using the example cURL request below:
curl --location 'https://api.aws.us-east-1.cerebrium.ai/v4/p-<YOUR PROJECT ID>/stable-diffusion/predict' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR TOKEN HERE>' \
--data '{
"prompt": "a photo of an astronaut riding a horse on mars"
}'
The Cerebrium platform provides the tools and support needed for a smooth transition.
Connect with other developers and the Cerebrium team for faster response and issue resolution:
- Join the Discord server.
- Join the Slack workspace.
These communities offer migration support, quick technical answers, best practices, and feature updates.