Hyperparameter Tuning: HF Models

Setup

Split competition dataset into train (90%) and test (10%)
Split train into train (90%) and valid (10%).
Keep only 20% of train as train.
Define sweep configuration for all models as listen below:

sweep_config = {
    # How to perform hyperparameter tuning
    "method": "random",
    
    # How to evaluate which hyperparameter combination is good
    "metric": {
        "name": "QWK",
        "goal": "maximize",
    },
    
    # Hyperparameters to tune
    "parameters": {
        # Hyperparameters that will change
        "lr": {"distribution": "uniform", "min": 1e-5, "max": 1e-3},
        "weight_decay": {"distribution": "uniform", "min": 0.01, "max": 0.1},
        "num_epochs": {"values": [3, 4, 5]},
        "warmup_ratio": {"distribution": "uniform", "min": 0.01, "max": 0.1},
        "lr_scheduler_type": {"values": ["cosine", "linear"]},
        "batch_size": {"values": [8, 16, 32]},
    },
    
    # Early stopping
    "early_terminate": {
        "type": "hyperband",
        "min_eter": 2,
    }
}

Results

DeBERTa-V3

deberta-sweep-random

Optimal set of hyperparameters

Learning Rate: 0.0001148
Epochs: 5
Batch Size: 16
LR Scheduler: Cosine
Warmup ratio: 0.07821
Weight Decay: 0.03224

BigBird RoBERTa

bigbird-sweep-random

Optimal set of hyperparameters

Learning Rate: 0.00003715
Epochs: 5
Batch Size: 16
LR Scheduler: Cosine
Warmup ratio: 0.04982
Weight Decay: 0.05612

LongFormer-4096

longformer-sweep-random

Optimal set of hyperparameters

Learning Rate: 0.00007843
Epochs: 3
Batch Size: 32
LR Scheduler: Linear
Warmup ratio: 0.02276
Weight Decay: 0.05573

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hyperparameter Tuning: HF Models

Setup

Results

DeBERTa-V3

BigBird RoBERTa

LongFormer-4096

Funnel-4096

Yoso-4096

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally