# Cerebrium

> Cerebrium is serverless GPU infrastructure for real-time AI — voice agents, video models, LLMs, and custom ML apps. Sub-second cold starts, pay-per-second billing, no Kubernetes.

## What we do

- **Voice AI**: Real-time end-to-end pipelines for voice agents (STT + LLM + TTS).
- **Video & generative media**: Low-latency inference for video generation, image diffusion, and avatars.
- **LLMs**: Serverless deployment for OpenAI-compatible endpoints, open-source models (DeepSeek, Llama, Orpheus), and custom fine-tunes.
- **General ML**: Any Python workload, any GPU (L4, L40s, A100, H100, H200), pay only for compute-time used.

## Key Facts

- Product: serverless GPU infrastructure for real-time AI inference — voice, video, LLMs, and custom ML.
- Supported GPUs: L4, L40s, A100, H100, H200 — billed per-second of compute used, no idle charges or minimum reservations.
- Cold starts: sub-second on the largest models (H100/H200 inference).
- Voice pipelines: sub-500ms end-to-end latency (STT + LLM + TTS).
- Deployment: multi-region (US + EU) for data residency; bring-your-own-code (any Python script or container, no lock-in).
- Compliance: SOC 2, HIPAA, GDPR, ISO.
- Reference customers: Resemble AI, Camb AI, Telli, Amira Learning, Invofox, Creatium.
- Open source: github.com/CerebriumAI — 522-star examples repo (voice agents, LLMs, video, RAG).

## Key pages

- [Serverless GPU Infrastructure for Real-Time AI](https://cerebrium.ai/): Deploy voice agents, video models, and LLMs on serverless GPUs with sub-second cold starts. Pay-per-second pricing. No Kubernetes.
- [Pay-Per-Second Pricing for Serverless AI](https://cerebrium.ai/pricing): Pay for compute by the second, not the hour. Transparent serverless GPU pricing for voice, LLMs, and video. No commitment, no idle costs.
- [Our Mission — Real-Time AI Infrastructure](https://cerebrium.ai/about): Cerebrium is the team building global serverless GPU infrastructure for real-time AI model applications.
- [Book a Demo — Technical Architecture Review](https://cerebrium.ai/book-demo): See the technical architecture behind AI teams deploying real-time voice agents, LLMs, and video models on Cerebrium. 30-minute demo with our team.
- [Contact — Sales, Support, Partnerships](https://cerebrium.ai/contact): Get in touch with Cerebrium for sales, partnerships, support, or enterprise inquiries. Real-time replies during business hours.
- [Brand Assets — Logos & Guidelines](https://cerebrium.ai/brand-assets): Download Cerebrium logos, color palette, typography, and brand guidelines for press, partnerships, and media coverage.

## Use cases

- [Large Language Models](https://cerebrium.ai/use-cases/large-language-models): Run and deploy LLMs at scale
- [Voice](https://cerebrium.ai/use-cases/voice): Infrastructure built for low-latency voice at scale
- [Image & Video](https://cerebrium.ai/use-cases/image-and-video): Run image and video pipelines at scale

## Documentation

- [Documentation home](https://docs.cerebrium.ai/getting-started/introduction): Full developer docs (hosted on docs.cerebrium.ai).
- [Cerebrium examples](https://github.com/CerebriumAI/examples): 522-star reference repo covering voice agents, LLMs, video, RAG.

## Blog (selected high-value posts)

- [How much does a H100 cost? Cost comparision](https://cerebrium.ai/blog/how-much-does-a-h100-cost-cost-comparision): GPU cost comparison.
- [How much does a H200 cost? 2025 Guide](https://cerebrium.ai/blog/how-much-does-a-h200-cost-2025-guide): H200 pricing breakdown.
- [Top 5 Serverless GPU providers](https://cerebrium.ai/blog/top-5-serverless-gpu-providers): Competitive landscape.
- [Creating a realtime RAG voice agent](https://cerebrium.ai/blog/creating-a-realtime-rag-voice-agent): Tutorial.
- [Deploying DeepSeek-R1: A Guide to a Serverless, High-Performaning OpenAI-Compatible Endpoint](https://cerebrium.ai/blog/deploying-deepseek-r1-a-guide-to-a-serverless-high-performaning-openai-compatible-endpoint): OpenAI-compatible endpoint guide.
- [Orpheus TTS: How to Deploy Orpheus at Scale for Production Inference](https://cerebrium.ai/blog/orpheus-tts-how-to-deploy-orpheus-at-scale-for-production-inference): Production TTS deployment.
- [Deploying Sesame CSM: The Most Realistic Voice Model as an API](https://cerebrium.ai/blog/deploying-sesame-csm-the-most-realistic-voice-model): Voice model guide.
- [Launch Week Day 3: Annoucing Multi-Region Deployments](https://cerebrium.ai/blog/launch-week-day-3-annoucing-multi-region-deployments): Product announcement.
- [Rethinking Container Image Distribution to eliminate cold starts](https://cerebrium.ai/blog/rethinking-container-image-distribution-to-eliminate-cold-starts): Engineering deep dive.
- [The Shortcomings of Celery + Redis for ML Workloads and How Cerebrium Solves It](https://cerebrium.ai/blog/celery-redis-vs-cerebrium): Migration comparison.
- [Faster Whisper Transcription: How to Maximize Performance for Real-Time Audio-to-Text](https://cerebrium.ai/blog/faster-whisper-transcription-how-to-maximize-performance-for-real-time-audio-to-text): Real-time STT.
- [Integrating PayPal’s Model Context Protocol (MCP) into a Real-time Voice Agent](https://cerebrium.ai/blog/integrating-paypal-s-model-context-protocol-mcp-into-a-real-time-voice-agent): MCP integration.
- [Introducing Cerebrium run: The Fastest Way to Execute Cloud Code](https://cerebrium.ai/blog/introducing-cerebrium-run-the-fastest-way-to-execute-cloud-code): Cloud-code execution.
- [Blog index](https://cerebrium.ai/blog): All posts.

## Contact

- Website: https://cerebrium.ai/
- Book a demo: https://cerebrium.ai/book-demo
- General inquiries: https://cerebrium.ai/contact

## Optional

- [GitHub @CerebriumAI](https://github.com/CerebriumAI): Open-source examples and tools.
- [Privacy policy](https://cerebrium.ai/privacy)
- [Terms of service](https://cerebrium.ai/terms-of-service)