Deep-Maximum-Entropy-Inverse-Reinforcement-Learning-Informed Water Reservoir Control

Water reservoir control problems are traditionally formulated as Markov Decision Processes, with a cost/reward associated with the system’s state and the operator’s action. The optimal actions of the operator, therefore, minimize/maximize the cumulative discounted cost/reward over the planning horizon. Unfortunately, accurately modeling human-controlled reservoir operation remains difficult, and we often witness a discrepancy between observed data and model-based simulations. This problem can be partially attributed to the inaccurate description, or even the lack of the “real” objective function guiding the operator’s behavior. In this paper, we introduce an Inverse Reinforcement Learning algorithm to model the dynamics of the optimal water reservoir control problem, with the aim of recovering a reward function that best explains the observed behavior of reservoir operation from historical expert demonstrations. Specifically, we employ a Deep Maximum Entropy Inverse Reinforcement Learning (Deep MaxEnt IRL) model, in which we use a deep neural network to model the relationship between state-action pairs and the reward and rely on the maximum entropy principle to update neural network parameters. We then use a Boltzmann soft-max optimality update to derive the optimal control policy under the recovered reward function. Compared to existing implementations, our algorithm is capable of recovering the entire reward function mapping instead of only specific parameters, such as weights, making the algorithm deployable to different reservoirs without needing any a priori knowledge. We demonstrate the potential of the proposed algorithm by running a Monte Carlo simulation based on the optimal control policy guided by the recovered reward function on the Grand Coulee Dam in Washington State. We observed a close match between the simulated and actual release on unseen test period data, showcasing our algorithm’s ability to encapsulate the observed system dynamics.

To run the code, please use the included Jupyter Notebook. The notebook should be well-commented and self-explainatory. For questions, please do not hesitate to contact Jerry at zf245@cornell.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
DeepMaxEntropy_Boltzmann.ipynb		DeepMaxEntropy_Boltzmann.ipynb
LICENSE		LICENSE
README.md		README.md
ResOpsUS_310.csv		ResOpsUS_310.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep-Maximum-Entropy-Inverse-Reinforcement-Learning-Informed Water Reservoir Control

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Jerry-Zhuoer-Feng/Deep-Maximum-Entropy-Inverse-Reinforcement-Learning-Informed-Water-Reservoir-Control

Folders and files

Latest commit

History

Repository files navigation

Deep-Maximum-Entropy-Inverse-Reinforcement-Learning-Informed Water Reservoir Control

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages