Large Language Models

Cerebrium enables developers to deploy serverless LLMs simply, scalabley, and cost-effectively.

Get started

Why choose Cerebrium?

Innovate with features built for rapid deployment and inferencing of your models

Batching

Boost GPU utilisation and reduce costs with continuous and dynamic request batching. Increase your throughput without sacrificing latency and improve your user experience.

Streaming for realtime responsiveness

You’re able to stream LLM outputs to reduce perceived latency and create more responsive applications - a better overall experience for your end-users.

The nimble fox darted across the meadow, its russet coat gleaming in the sunlight. With graceful leaps, it navigated fallen logs and tall grass, ever watchful for both prey and predators.

The perfect hardware for every workload

Our platform offers a wide range of options from CPUs to the latest NVIDIA H100s, matching your workload to the most cost-effective hardware.

NVIDIA H100

Ideal for demanding inference and training tasks

NVIDIA A100

Ideal for most LLM inference tasks

NVIDIA L40s

Ideal for most LLM inference tasks

NVIDIA L40s

Ideal for inference on larger LLMs

NVIDIA H100

Ideal for demanding inference and training tasks

NVIDIA A100

Ideal for most LLM inference tasks

NVIDIA L40s

Ideal for most LLM inference tasks

NVIDIA L40s

Ideal for inference on larger LLMs

Deploy your LLM in 5 Minutes

Go from local development to production-ready deployments and applications in just five minutes, with our intuitive platform and pre-configured starter templates.

Starter templates

Starter templates for you to get up and running quickly.

vLLM - Phi 3

Deploy Microsoft's Phi 3 mini 4k construct model using vLLM.

Starter templates

Starter templates for you to get up and running quickly.

vLLM - Phi 3

Deploy Microsoft's Phi 3 mini 4k construct model using vLLM.

Real-world applications

What some of our customers are doing…

Cerebrium's users make use of Large Language Models (LLMs) to process information, optimise operations and practically deliver value to their customers and partners.

Translation

Our users apply LLMs to translate documents, audio, and video; across multiple languages and contexts.

Generation & Summarisation

Our users utilize LLMs to generate and summarize content, transforming complex information into clear, concise summaries across various formats.

Retrieval-Augemented Generation

RAG

Our users combine language understanding and precision data retrieval for unparalleled accuracy and relevance in ML apps.

Get Started

Llama 3B on TensorRT-LLM

Implement the TensorRT-LLM framework to serve a high throughput Llama 3 8B model.

Llama 3B on TensorRT-LLM

Implement the TensorRT-LLM framework to serve a high throughput Llama 3 8B model.

Llama 3B on TensorRT-LLM

Implement the TensorRT-LLM framework to serve a high throughput Llama 3 8B model.

Mistral 7B with vLLM

Deploy Mistral 7B using the popular vLLM inference framework.

Mistral 7B with vLLM

Deploy Mistral 7B using the popular vLLM inference framework.

Mistral 7B with vLLM

Deploy Mistral 7B using the popular vLLM inference framework.

Streaming LLM Output

Implement streaming to return results to your users as soon as possible.

Streaming LLM Output

Implement streaming to return results to your users as soon as possible.

Streaming LLM Output

Implement streaming to return results to your users as soon as possible.

OpenAI compatible vLLM endpoint

Create an OpenAI compatible endpoint that can be used with any open-source model.

OpenAI compatible vLLM endpoint

Create an OpenAI compatible endpoint that can be used with any open-source model.

OpenAI compatible vLLM endpoint

Create an OpenAI compatible endpoint that can be used with any open-source model.

Trying out AI at your company?

We offer up to $1,000.00 in free credits and face-time with our engineers to get you started.

Trying out AI at your company?

We offer up to $1,000.00 in free credits and face-time with our engineers to get you started.

Trying out AI at your company?

We offer up to $1,000.00 in free credits and face-time with our engineers to get you started.

Product

Pricing

Developers

Docs

Status

Company

Blog

About

Use cases

Large language models

Voice

Image & Video

Resources

Examples

Articles

Brand assets

Product

Pricing

Developers

Docs

Status

Company

Blog

About

Use cases

LLMs

Voice

Image & Video

Resources

Examples

Articles

Brand assets

Product

Pricing

Developers

Docs

Status

Company

Blog

About

Use cases

LLMs

Voice

Image & Video

Resources

Examples

Articles

Brand assets