Inter-cluster routing enables direct, low-latency communication between Cerebrium apps within the same region. Traffic stays off the public internet, reducing latency and improving performance. Each application scales independently based on its configured scaling parameters. Inter-cluster routing provides:Documentation Index
Fetch the complete documentation index at: https://cerebrium.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
- Low latency: Direct container-to-container communication within the same region (~0.3–1 ms typical)
- High bandwidth: Up to 50 Gbps between containers
- No public internet: Apps communicate directly without external routing
- Observablity: All requests appear in the Cerebrium dashboard with full logs, payloads, and latency metrics
How It Works

http://api.aws/v4/<project_id>/<app_name>/<func_name>
This endpoint pattern remains the same across all regions, so URLs do not change when deploying to multiple locations. Inter-cluster routing only works between applications deployed within the same region. Requests never traverse the public internet — they stay fully contained within the cluster network, achieving typical latencies of 0.3–1 ms and bandwidth up to 50 Gbps between containers.
gRPC is not currently supported but is on the roadmap.