# Cerebrium > Cerebrium is serverless GPU infrastructure for real-time AI — voice agents, video models, LLMs, and custom ML apps. Sub-second cold starts, pay-per-second billing, no Kubernetes. ## What we do - **Voice AI**: Real-time end-to-end pipelines for voice agents (STT + LLM + TTS). - **Video & generative media**: Low-latency inference for video generation, image diffusion, and avatars. - **LLMs**: Serverless deployment for OpenAI-compatible endpoints, open-source models (DeepSeek, Llama, Orpheus), and custom fine-tunes. - **General ML**: Any Python workload, any GPU (L4, L40s, A100, H100, H200), pay only for compute-time used. ## Key Facts - Product: serverless GPU infrastructure for real-time AI inference — voice, video, LLMs, and custom ML. - Supported GPUs: L4, L40s, A100, H100, H200 — billed per-second of compute used, no idle charges or minimum reservations. - Cold starts: sub-second on the largest models (H100/H200 inference). - Voice pipelines: sub-500ms end-to-end latency (STT + LLM + TTS). - Deployment: multi-region (US + EU) for data residency; bring-your-own-code (any Python script or container, no lock-in). - Compliance: SOC 2, HIPAA, GDPR, ISO. - Reference customers: Resemble AI, Camb AI, Telli, Amira Learning, Invofox, Creatium. - Open source: github.com/CerebriumAI — 522-star examples repo (voice agents, LLMs, video, RAG). ## Key pages - [Serverless GPU Infrastructure for Real-Time AI](https://cerebrium.ai/): Deploy voice agents, video models, and LLMs on serverless GPUs with sub-second cold starts. Pay-per-second pricing. No Kubernetes. - [Pay-Per-Second Pricing for Serverless AI](https://cerebrium.ai/pricing): Pay for compute by the second, not the hour. Transparent serverless GPU pricing for voice, LLMs, and video. No commitment, no idle costs. - [Our Mission — Real-Time AI Infrastructure](https://cerebrium.ai/about): Cerebrium is the team building global serverless GPU infrastructure for real-time AI model applications. - [Book a Demo — Technical Architecture Review](https://cerebrium.ai/book-demo): See the technical architecture behind AI teams deploying real-time voice agents, LLMs, and video models on Cerebrium. 30-minute demo with our team. - [Contact — Sales, Support, Partnerships](https://cerebrium.ai/contact): Get in touch with Cerebrium for sales, partnerships, support, or enterprise inquiries. Real-time replies during business hours. - [Brand Assets — Logos & Guidelines](https://cerebrium.ai/brand-assets): Download Cerebrium logos, color palette, typography, and brand guidelines for press, partnerships, and media coverage. ## Use cases - [Large Language Models](https://cerebrium.ai/use-cases/large-language-models): Run and deploy LLMs at scale - [Voice](https://cerebrium.ai/use-cases/voice): Infrastructure built for low-latency voice at scale - [Image & Video](https://cerebrium.ai/use-cases/image-and-video): Run image and video pipelines at scale ## Documentation - [Documentation home](https://docs.cerebrium.ai/getting-started/introduction): Full developer docs (hosted on docs.cerebrium.ai). - [Cerebrium examples](https://github.com/CerebriumAI/examples): 522-star reference repo covering voice agents, LLMs, video, RAG. ## Blog (selected high-value posts) - [How much does a H100 cost? Cost comparision](https://cerebrium.ai/blog/how-much-does-a-h100-cost-cost-comparision): GPU cost comparison. - [How much does a H200 cost? 2025 Guide](https://cerebrium.ai/blog/how-much-does-a-h200-cost-2025-guide): H200 pricing breakdown. - [Top 5 Serverless GPU providers](https://cerebrium.ai/blog/top-5-serverless-gpu-providers): Competitive landscape. - [Creating a realtime RAG voice agent](https://cerebrium.ai/blog/creating-a-realtime-rag-voice-agent): Tutorial. - [Deploying DeepSeek-R1: A Guide to a Serverless, High-Performaning OpenAI-Compatible Endpoint](https://cerebrium.ai/blog/deploying-deepseek-r1-a-guide-to-a-serverless-high-performaning-openai-compatible-endpoint): OpenAI-compatible endpoint guide. - [Orpheus TTS: How to Deploy Orpheus at Scale for Production Inference](https://cerebrium.ai/blog/orpheus-tts-how-to-deploy-orpheus-at-scale-for-production-inference): Production TTS deployment. - [Deploying Sesame CSM: The Most Realistic Voice Model as an API](https://cerebrium.ai/blog/deploying-sesame-csm-the-most-realistic-voice-model): Voice model guide. - [Launch Week Day 3: Annoucing Multi-Region Deployments](https://cerebrium.ai/blog/launch-week-day-3-annoucing-multi-region-deployments): Product announcement. - [Rethinking Container Image Distribution to eliminate cold starts](https://cerebrium.ai/blog/rethinking-container-image-distribution-to-eliminate-cold-starts): Engineering deep dive. - [The Shortcomings of Celery + Redis for ML Workloads and How Cerebrium Solves It](https://cerebrium.ai/blog/celery-redis-vs-cerebrium): Migration comparison. - [Faster Whisper Transcription: How to Maximize Performance for Real-Time Audio-to-Text](https://cerebrium.ai/blog/faster-whisper-transcription-how-to-maximize-performance-for-real-time-audio-to-text): Real-time STT. - [Integrating PayPal’s Model Context Protocol (MCP) into a Real-time Voice Agent](https://cerebrium.ai/blog/integrating-paypal-s-model-context-protocol-mcp-into-a-real-time-voice-agent): MCP integration. - [Introducing Cerebrium run: The Fastest Way to Execute Cloud Code](https://cerebrium.ai/blog/introducing-cerebrium-run-the-fastest-way-to-execute-cloud-code): Cloud-code execution. - [Blog index](https://cerebrium.ai/blog): All posts. ## Contact - Website: https://cerebrium.ai/ - Book a demo: https://cerebrium.ai/book-demo - General inquiries: https://cerebrium.ai/contact ## Optional - [GitHub @CerebriumAI](https://github.com/CerebriumAI): Open-source examples and tools. - [Privacy policy](https://cerebrium.ai/privacy) - [Terms of service](https://cerebrium.ai/terms-of-service)