RL-HAB

RL-HAB is a high altitude balloon (HAB) simulation environment for training altitude controllable agents using reinforcement learning. The simulator is wrapped in the standard gynmasium structure for training agents with classic open-source reinforcement learning libraries such as stabe-baselines-3, clearnrl, and RLlib. We provide examples of training and evaluating agents with DQN in stable-baselines-3. We also provide optional integration of wandb for training and optuna for automated hyperparameter tuning and analysis.

Altitude controllable high altitude balloons can strategically move up and down to different levels of the atmosphere to leverage changes in winds for limited horizontal control. A typical phenomenon that occurs in the atmosphere is opposing winds, where winds are moving in opposite directions at two different altitudes. HABs can leverage opposing winds to perform station keeping maneuvers for persistent area coverage of a target region over a time period of hours, days, or weeks, which can help with surveillance, in-situ stratospheric meteorological data collection, and/or communication relays. With perfect weather forecasts this would be a simple deterministic path planning problem, however forecasts frequently have large errors in wind direction (occasionally up to 180 degrees off) and also lack vertical and temporal resolution in the altitude region of interest, leading to significant uncertainty in dynamic flow fields. Additionally, wind diversity and opposing wind probabilities follow seasonal and geographical/hemispherical trends throughout the year. See "Wind Diversity Trends in the Lower Stratosphere Analyzed from Radiosondes launched in the Western Hemisphere" for more information on these global trends.

Google Loon was the first to demonstrate station-keeping using deep reinforcement learning and have open-sourced their simulation enviroment, "Balloon Learning Environment (BLE)".

Major Differences between BLE and RL-HAB:

BLE uses a variational autoencoder (VAE) for generating unique dynamic forecasts for training, and adds simplex noise to balloon dynamics. RL-HAB uses ERA5 Reanalysis as the "forecast" and synthetic forecasts (See /SynthWinds) generated from agreggated radiosonde data as ground truth.
BLE assumes SP balloon dynamics and resource constraints. RL-HAB uses simplified Solar Balloon dynamics without resource constraints.
RL-HAB evaluation is by month and region rather than overall evaluation like in BLE.
- BLE does an overall evaluation on the same "region and time period" (although forecasts are coming from a pre-trained vae not included in the codebase). BLE includes different evaluation suites as well as strata_seed options which can be a "natural distribution" (typical atmospheric condition frequencies like in suites tiny, small, medium, big)) and "even distribution" (equal number of hard, medium, easy station-keeping like in suites all_strata).
BLE includes deterministic StationSeeker controller, we included an even simpler baseline controller.

Installation

Install Anaconda3

If your machine has a GPU, you can additionally install GPU support.
Install Python Dependencies

For easy install on WSL and Ubuntu use:
```
pip install -r requirements.txt
```
Setup RL-HAB Environment:
```
pip3 install -e .
```
Tested to work on Windows 11 WSL with the following:
- Ubuntu 22.04
- Python Version 3.11
- Conda Version 23.7.4

Quickstart

Use the provided configuration file, formatted ERA5 forecast, synthetic forecast, and example trained model for evaluation.

See Docs for more detailed explanations on how to aqquire and format forecasts, update dynamics, change parameters, etc.

Evaluate Pre-Trained Model or baseline controller

This is an example of evaluating a DUAL (ERA5 reanalysis observation, SYNTHetic forecast for movement) pre-trainined model. Optionally set render_mode='human' in env_config.py for visualization. Complete ERA5 reanalysis forecast data is provided for 2023 North America Region, as well as Jan and July Synth winds, and a model trained in the month of Jul and Jan.

To try different evaluations update synth_netcdf and model_name in env_config.py.

ℹ️ Note Number of levels for the observation forecast must be equivelent across trainings and evaluations. For this example, and example Complete ERA5 reanalysis netcdf forecast for the USA region in 2023 was processed and reformatted using ERA5-Utils.

python3 evaluation/evaluate.py

python3 env/baseline_controller.py

Train a New Model

Complete and Pressure ERA5 reanalysis forecast data is provided for 2023 North America Region, as well as Jan and July Synth winds.

python3 env/train-DQN.py

To monitor the status of the model learning in realtime with tensorboard:

tensorboard log_dir logs_DUAL-Test

Preparing Forecasts for Simulation

See Docs

Hyperparameter Tuning and Optimization with Optuna

There are 2 examples of how to run optuna hyperparameter tuning sessions.

optuna_config.py sets up several high level variables that the other scripts use. This includes the project name, and model and logs storing directories. n_envs is how many vectorized envs to run see (https://stable-baselines.readthedocs.io/en/master/guide/examples.html) as an example. n_threads specifies how many virtual threaded versions of the script to run (equivelent to running the same script in N terminal windows). n_trials specifies how many trials to complete until the study is finished. If running multi threaded, this will be n_trials*n_threads

To initialize the hyperparmeter study and an optuna database file to store any analyzed data from, first run

initialize_study.py

Then either run optuna-multi-DUAL.py or optuna-multi-Single.py to run one session individually, or optuna-multi-*.py to run multiple threads at a time.

If running a multi threaded case be aware that there is a limit to computation power, start small and ramp up. There is a point where too many threads actually makes the simulation run slower and is less efficient; however this is different for every machine.

To see hyperparmeter results and plots on web, run:

optuna-dashboard sqlite:///db.sqlite3

Some example output from hyperparameter tuning with Optuna:

Notes/Discussion

Important notes on seeding:
- Specifying a seed in np.random.seed(seed) at the top level of a script will produce the same order of random numbers everytime np.random is called. If using multiple threads/processes, the random numbers generated will not be identical across the multiple processes, burt the random numbers in each process will be the same everytime the script is run.
- To have the same random numbers be generated across multiple threads/processes we have to set a seed for random number generationself.np_random = np.random.default_rng(seed) and then use self.np_random everywhere in place of np.random
- SB3 model decleration also takes an optional seed variable. If a seed is specified, the same random actions will be taken everytime. We typically don't want this.
CPU and GPU are about equal in training time
Simulation updates/variations to implement/try to improve learning performance:
- Kinematics vs. no Kinematics
- Different reward structures
- Limit control frequency (currently every step)
- Change observation space
  - Add or remove dimensions
  - Change to local coordinates (distance/bearing)
  - Change structure of "flow map"

Authors

Tristan Schuler - U.S. Naval Research Laboratory
Chinthan Prasad - U.S. Naval Research Laboratory
Georgiy Kiselev - U.S. Naval Research Laboratory (NREIP Intern)

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
PAPER_IMAGES		PAPER_IMAGES
Plotting		Plotting
SynthWinds		SynthWinds
callbacks		callbacks
docs		docs
env		env
evaluation		evaluation
img		img
models		models
optuna		optuna
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RL-HAB

Installation

Install Anaconda3

Install Python Dependencies

Quickstart

Evaluate Pre-Trained Model or baseline controller

Train a New Model

Preparing Forecasts for Simulation

Hyperparameter Tuning and Optimization with Optuna

Notes/Discussion

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

USNavalResearchLaboratory/RL-HAB

Folders and files

Latest commit

History

Repository files navigation

RL-HAB

Installation

Install Anaconda3

Install Python Dependencies

Quickstart

Evaluate Pre-Trained Model or baseline controller

Train a New Model

Preparing Forecasts for Simulation

Hyperparameter Tuning and Optimization with Optuna

Notes/Discussion

Authors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages