Documentation Index
Fetch the complete documentation index at: https://cerebrium.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Rime Partner Service is available from CLI version 1.39.0 and greater
The Rime Partner Service is in beta. It’s available to all users and ready for production workloads, but expect occasional rough edges while the integration matures. Reach out to support if you hit any issues.
Setup
- Create a Rime account and get an API key. Add the key as a secret in Cerebrium with the name “RIME_API_KEY”.
- Create a Cerebrium app with the CLI:
- Rime services use a simplified TOML configuration with the
[cerebrium.runtime.rime]section. Create acerebrium.tomlfile with the following:
Disable auth because the Rime API key in the header handles authentication.
The Rime Server validates the API key directly.
- Run
cerebrium deployto deploy the Rime service - the output of which should appear as follows:
- Send requests to the HTTP Rime service using the deployment URL from the output:
Runtime Configuration
The[cerebrium.runtime.rime] section supports the following parameters:
| Option | Type | Default | Description |
|---|---|---|---|
port | integer | required | Port the Rime server listens on. Typically 8001. |
model_name | string | — | Rime model to load (e.g. "arcana", "mist", "mistv2"). Defaults to Rime’s server default if not set. |
language | string | — | Language code for the model (e.g. "en", "es"). Defaults to Rime’s server default if not set. |
Scaling and Concurrency
Rime services support independent scaling configurations:- min_replicas: Minimum instances to maintain (0 for scale-to-zero). Recommended: 1.
- max_replicas: Maximum instances during high load.
- replica_concurrency: Concurrent requests per instance. Recommended: 3.
- cooldown: Time window (in seconds) that must pass at reduced concurrency before scaling down. Recommended: 50.
- compute: Instance type. Recommended:
AMPERE_A10.