Skip to content

Hyperparameter Tuning: HF Models

Shakleen Ishfar edited this page Jun 14, 2024 · 3 revisions

Setup

  1. Split competition dataset into train (90%) and test (10%)
  2. Split train into train (90%) and valid (10%).
  3. Keep only 20% of train as train.
  4. Define sweep configuration for all models as listen below:
sweep_config = {
    # How to perform hyperparameter tuning
    "method": "random",
    
    # How to evaluate which hyperparameter combination is good
    "metric": {
        "name": "QWK",
        "goal": "maximize",
    },
    
    # Hyperparameters to tune
    "parameters": {
        # Hyperparameters that will change
        "lr": {"distribution": "uniform", "min": 1e-5, "max": 1e-3},
        "weight_decay": {"distribution": "uniform", "min": 0.01, "max": 0.1},
        "num_epochs": {"values": [3, 4, 5]},
        "warmup_ratio": {"distribution": "uniform", "min": 0.01, "max": 0.1},
        "lr_scheduler_type": {"values": ["cosine", "linear"]},
        "batch_size": {"values": [8, 16, 32]},
    },
    
    # Early stopping
    "early_terminate": {
        "type": "hyperband",
        "min_eter": 2,
    }
}

Results

DeBERTa-V3

deberta-sweep-random image

Optimal set of hyperparameters

  • Learning Rate: 0.0001148
  • Epochs: 5
  • Batch Size: 16
  • LR Scheduler: Cosine
  • Warmup ratio: 0.07821
  • Weight Decay: 0.03224

BigBird RoBERTa

bigbird-sweep-random image

Optimal set of hyperparameters

  • Learning Rate: 0.00003715
  • Epochs: 5
  • Batch Size: 16
  • LR Scheduler: Cosine
  • Warmup ratio: 0.04982
  • Weight Decay: 0.05612

LongFormer-4096

longformer-sweep-random image

Optimal set of hyperparameters

  • Learning Rate: 0.00007843
  • Epochs: 3
  • Batch Size: 32
  • LR Scheduler: Linear
  • Warmup ratio: 0.02276
  • Weight Decay: 0.05573

Funnel-4096

Yoso-4096

Clone this wiki locally