Hyperparameter sweeps systematically test parameter combinations to find the best-performing model for the least compute or training time. This tutorial covers training Llama 3.2, using Wandb (Weights and Biases) to run hyperparameter sweeps and Cerebrium to scale experiments across serverless GPUs. View the final version on GitHub. Read this section if you’re unfamiliar with sweeps.Documentation Index
Fetch the complete documentation index at: https://cerebrium.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Analogy: Pizza Topping Sweep
Forget about ML for a second. Imagine making pizzas to discover the best combination of toppings. Three variables are available: • Type of Cheese (mozzarella, cheddar, parmesan) • Type of Sauce (tomato, pesto) • Extra Topping (pepperoni, mushrooms, olives) There are 12 possible combinations. One of them will taste the best. To find the tastiest pizza, try all combinations and rate them. This process is a hyperparameter sweep. The three hyperparameters are cheese, sauce, and extra topping. One pizza at a time takes hours. With 12 ovens, all pizzas bake at once and the best one emerges in minutes. If a kitchen is a GPU, 12 GPUs run all experiments in parallel. Cerebrium enables sweeps across 12 GPUs (or 1,000) to find the best model version fast.Setup Cerebrium
If you don’t have a Cerebrium account, run the following:- main.py - The entrypoint file where application code lives.
- cerebrium.toml - A configuration file for build and environment settings.
Setup Wandb
Weights & Biases (Wandb) tracks, visualizes, and manages machine learning experiments in real-time. It logs hyperparameters, metrics, and results for comparing models and optimizing performance.- Sign up for a free account and then log in to your wandb account by running the following in your CLI.
- Key: WANDB_API_KEY
- Value: The value you copied from the Wandb website.

Training Script
To train with Llama 3.2, you’ll need:-
Model access permission:
- Visit the Llama 3.2 model page on Hugging Face
- Accept all permissions
-
Hugging Face token:
- Click your profile image (top right)
- Select “Access token”
- Create a new token if needed
- Add to Cerebrium Secrets:
- Key:
HF_TOKEN - Value: Your Hugging Face token
- Key:
- Click “Save All Changes”

requirements.txt file with these dependencies:
cerebrium.toml to include:
- The requirements.txt path
- Hardware requirements for training
- A 1-hour max timeout using
response_grace_period
main.py:
- This code sets up a fine-tuning pipeline for a Large Language Model (specifically Llama 3.2) using several modern training techniques:
- Takes a dictionary of parameters for flexible training configurations — the hyperparameter sweep.
- Loads a customer support dataset from Hugging Face and formats it into chat template format
- Implements QLoRA (Quantized Low-Rank Adaptation) for efficient fine-tuning.
- Uses Weights & Biases (Wandb) for experiment tracking, logging results to the Wandb dashboard.
- Saves the final model to a Cerebrium volume and returns a “success” message.
- Sets up the environment with required packages
- Deploys the training script as an endpoint
- Returns a POST URL (save this for later)
Hyperparameter Sweep
Create a run.py file for running locally. Add the following code:- Create a .env file and add the Inference API key from the Cerebrium Dashboard.
- Update the Cerebrium endpoint with the correct project ID and function name. The URL is appended with “?async=true”. This makes it a fire-and-forget request that can run up to 12 hours. Read more here.
- The Bayesian optimization sweep configuration searches through these hyperparameters:
- Learning rate (log uniform distribution between ~4.54e-5 and ~9.12e-4)
- Batch size (1, 2, or 4)
- Gradient accumulation steps (2, 4, or 8)
- LoRA parameters (r, alpha, and dropout)
- Maximum sequence length (512 or 1024)
- The sweep is created in the “Llama-3.2-Customer-Support” W&B project
- For each sweep iteration:
- Initializes a new W&B run
- Combines the sweep’s hyperparameters with fixed parameters (like model name and dataset)
- Sends the parameters to a Cerebrium endpoint for training that happens asynchronously.
- Logs the results back to W&B
- Runs 10 experiments (10 concurrent GPUs is the limit on Cerebrium’s Hobby plan)


Next Steps
-
Export model:
- Copy to AWS S3 using Boto3
- Download locally using Cerebrium Python package
-
Quality assurance:
- Run CI/CD tests on model outputs
- Use Cerebrium’s webhook functionality
-
Deployment:
- Create inference endpoint
- Load model directly from Cerebrium volume