-
Notifications
You must be signed in to change notification settings - Fork 0
Hyperparameter Tuning: HF Models
Shakleen Ishfar edited this page Jun 14, 2024
·
3 revisions
- Split competition dataset into train (90%) and test (10%)
- Split train into train (90%) and valid (10%).
- Keep only 20% of train as train.
- Define sweep configuration for all models as listen below:
sweep_config = {
# How to perform hyperparameter tuning
"method": "random",
# How to evaluate which hyperparameter combination is good
"metric": {
"name": "QWK",
"goal": "maximize",
},
# Hyperparameters to tune
"parameters": {
# Hyperparameters that will change
"lr": {"distribution": "uniform", "min": 1e-5, "max": 1e-3},
"weight_decay": {"distribution": "uniform", "min": 0.01, "max": 0.1},
"num_epochs": {"values": [3, 4, 5]},
"warmup_ratio": {"distribution": "uniform", "min": 0.01, "max": 0.1},
"lr_scheduler_type": {"values": ["cosine", "linear"]},
"batch_size": {"values": [8, 16, 32]},
},
# Early stopping
"early_terminate": {
"type": "hyperband",
"min_eter": 2,
}
}
Optimal set of hyperparameters
- Learning Rate: 0.0001148
- Epochs: 5
- Batch Size: 16
- LR Scheduler: Cosine
- Warmup ratio: 0.07821
- Weight Decay: 0.03224
Optimal set of hyperparameters
- Learning Rate: 0.00003715
- Epochs: 5
- Batch Size: 16
- LR Scheduler: Cosine
- Warmup ratio: 0.04982
- Weight Decay: 0.05612
Optimal set of hyperparameters
- Learning Rate: 0.00007843
- Epochs: 3
- Batch Size: 32
- LR Scheduler: Linear
- Warmup ratio: 0.02276
- Weight Decay: 0.05573