Partner Services are available from CLI version 1.39.0 and greater
Partner Services are in beta. They’re available to all users and ready for
production workloads, but expect occasional rough edges while the integrations
mature. Reach out to support if you hit any
issues.
Benefits of Partner Services
Partner Services provide:- Quick and easy deployment
- Independent scaling of each service
- Reduced costs by running models on Cerebrium’s optimized runtime
- Reduced latency by running models on the same network as the app
- Deploy to specific regions for data compliance and latency requirements
Getting Started
Configure service-specific requirements through the Cerebrium platform. Refer to individual service pages linked above for detailed requirements, which may include:- API keys and authentication details
- Service-specific configuration parameters
- Resource requirements and limitations
Scaling and Concurrency
Partner Services support independent scaling configurations:- Use the
min_replicasandmax_replicasparameters to control the number of instances - The
replica_concurrencyparameter determines how many concurrent requests each instance can handle - Adjust the
cooldownparameter to control the time window that must pass at reduced concurrency before scaling down - Adjust the
hardwaresection to control the instance type which affects performance and/or cost