|
| 1 | + |
| 2 | +# Intro |
| 3 | + |
| 4 | +Experiments configuration overrides are located in Hydra's config folder: `configs/experiments`. |
| 5 | + |
| 6 | +All latest experiments use the Sigmoïd reward that revealed its superiority againts reward V3E for example. |
| 7 | + |
| 8 | +We do not list here, experiments done for: |
| 9 | + - determining best ALPHA and BETA values. We identified ALPHA=0.2 and BETA=0 are best. |
| 10 | + |
| 11 | +Some backuped experiments are located in `configs/experiments/bak` |
| 12 | + |
| 13 | + |
| 14 | +# Hyper-parameters optimization with Optuna |
| 15 | + |
| 16 | +Expermiments names <experiment_name> are located in are prefixed either by `00-opt` or `01-opt`. |
| 17 | + |
| 18 | +The folder name, created in `outputs` folder is `E11_56_<experiment_name>` |
| 19 | + |
| 20 | +Hyper-parameters tuning experiments use Optuna on several trials and multiple seeds per trial, to find the best hyper-parameters. |
| 21 | + |
| 22 | +Experiments config files located in `configs/exeperiments` are: |
| 23 | + |
| 24 | + - `00-opt_a2c_alphabeta_sigmoid`, gamma: range(0.90,0.996, step=0.004), Best hyper-params: (learning_rate=0.0002 gamma=0.908 n_steps=18) |
| 25 | + - `00-opt_ddpg_alphabeta_sigmoid`, (/!\ folder prefix `E11_55)`, Best hyper-params: (learning_rate=0.0007999999999999999 buffer_size=1000000 batch_size=512 gamma=0.94 noise_sigma=0.05 train_freq=1) |
| 26 | + - `01-opt_dqn_alphabeta_sigmoid`, gamma:range(0.01,1, step=0.05) => Best gamma evaluation is 0.76 (learning_rate=0.00054 batch_size=64 train_freq=128) |
| 27 | + - `01-opt_linucb_kfoofw_sigmoid_randomtrained_hlr`, learning_rate: range(0.8, 2, step=0.1), coeff exploration non cappé (2.0), best is alpha=1.4 |
| 28 | + - /!\ `01-opt_ppo_alphabeta_sigmoid`, gamma: range(0.01,1, step=0.05), Best: (learning_rate=0.0005 batch_size=128 normalize_advantage=True n_epochs=3 gamma=0.81) => ATTENTION, résultats à reprendre dans papier |
| 29 | + - `00-opt_sac_alphabeta_sigmoid`, gamma: range(0.90,0.996, step=0.005), Best: (learning_rate=0.000119 buffer_size=2000000 batch_size=512 gamma=0.995 train_freq=49 gradient_steps=-1) |
| 30 | + |
| 31 | +Not retained: |
| 32 | + - `00-opt_dqn_alphabeta_sigmoid`, gamma:range(0.90,0.996, step=0.005) => Not retained because gamma range is too small |
| 33 | + - `01-opt_dqn_alphabeta_sigmoid_gamma0`, gamma=0 => Not retained, gamma=0 is not a good idea |
| 34 | + - `00-opt_linucb_kfoofw_alphabeta_sigmoid`, learning_rate: range(0.5, 1.2, step=0.1), coeff exploration cappé, entrainement séquentiel. => Not retained, Sequential learning on dataset works badly |
| 35 | + - `01-opt_linucb_kfoofw_sigmoid_randomtrained`, learning_rate: range(0.1, 1.2, step=0.1), coeff exploration cappé, entrainement séquentiel => Not retained, Alpha(badly named LR) has a too small range |
| 36 | + |
| 37 | +# Basic sweep on parameters with constant hyper-parameters |
| 38 | + |
| 39 | +Experiments with suffix `_sweepseed` and `_sweepBETA` do NOT use Optuna. |
| 40 | + |
| 41 | +Elle fixent les hyper-paramètres avec les valeurs qu'on a trouvé depuis les expériences d'optimisation Optuna `00-opt_xxx` ou `01-opt_xxx`. |
| 42 | + |
| 43 | +Expériences sweep, `<version_prefix>_<hydra_experiment>`. |
| 44 | + |
| 45 | +Experiments list (post-optimisation of hyper-parameters) for the variability tests regarding the choice of the Random Seed, listed by folder name (`E11_<minorversion><experiment_name>`): |
| 46 | + - `E11_56_a2c_alphabeta_sweepseed`, |
| 47 | + - `E11_56_ddpg_alphabeta_sweepseed`, |
| 48 | + - `E11_56_linucb_kfoofw_randomtrained_hlr_sweepseed`, sweepseed with exploration coef non cappé (alpha=1.4) |
| 49 | + - `E11_56_linucb_kfoofw_sweepBETA`, OK, clearly BETA must be set to 0 |
| 50 | + - `E11_56_sac_alphabeta_sweepseed`, |
| 51 | + - *... `E11_57_dqn_alphabeta_gammaopti_sweepseed` gamma=0.76 (on s'est rendu compte que abaisser le gamma sous 0.9 est bénéfique) |
| 52 | + - * `E11_57_dqn_sweepBETA, use gamma=0.76 |
| 53 | + - `E11_57_ppo_alphabeta_sweepseed`, with gamma=0.81 and other hyper parameters fixed |
| 54 | + - `E11_57_ppo_sweepBETA` => BETA=0 is the best, however, results do not decrease linearly while BETA increase up to 1 (like with LinUCB) |
| 55 | + |
| 56 | +Not retained experiments: |
| 57 | + - `E11_56_dqn_alphabeta_sweepseed`, sweepseed avec un gamma=0.9 |
| 58 | + - `E11_56_dqn_alphabeta_gamma0`, sweepseed avec gamma=0 |
| 59 | + - /!\ `E11_56_ppo_alphabeta_sweepseed`, is wrong because of hyper parameters choices are not the good ones |
| 60 | + |
| 61 | + |
| 62 | +Experiments with large gamma range, with `version_minor: 57 or 58` (*E11_57_xxx*): |
| 63 | + - 01-opt_a2c_alphabeta_sigmoid => OK, results sounds very good. Best hyper-params: (learning_rate=0.0001 n_steps=13 gamma=0.91 gae_lambda=0.2) |
| 64 | + - a2c_best_sweepseed On going => OK |
| 65 | + - 01-opt_ddpg_alphabeta_sigmoid => OK, interesting results. Best params: (learning_rate=0.0005 buffer_size=500000 batch_size=128 gamma=0.8600000000000001 tau=0.246 noise_sigma=0.35000000000000003 train_freq=33) |
| 66 | + - ddpg_best_sweepseed => OK |
| 67 | + - 01-opt_sac_alphabeta_sigmoid => KO, too long training time, stopped and results lost |
| 68 | + |
| 69 | +Results give DQN and A2C the best NN algo. Interestingly, both are the only working with discrete Action Space (in Gym's env and passed to the SB3 policy). Other NN algo use a Continous Action Space (DDPG, SAC, PPO). We decide to run same PPO experiment (01-opt_ppo_alphabeta_sigmoid) with a discrete action space. |
| 70 | + |
| 71 | +Experiment with Discrete continous action for PPO, in `version_minor: 58` (*E11_58_xxx*). Not put in our first research paper, while these results came too late: |
| 72 | + - 02-opt_ppo_sigmoid_discrete_as => Better results here! (learning_rate=0.0001 batch_size=64 normalize_advantage=False n_epochs=24 gamma=0.86) |
| 73 | + - ppo_as_discrete_sweepseed => In the same league as A2C and DQN, even a bit better. |
| 74 | + |
0 commit comments