Water reservoir control problems are traditionally formulated as Markov Decision Processes, with a cost/reward associated with the system’s state and the operator’s action. The optimal actions of the operator, therefore, minimize/maximize the cumulative discounted cost/reward over the planning horizon. Unfortunately, accurately modeling human-controlled reservoir operation remains difficult, and we often witness a discrepancy between observed data and model-based simulations. This problem can be partially attributed to the inaccurate description, or even the lack of the “real” objective function guiding the operator’s behavior. In this paper, we introduce an Inverse Reinforcement Learning algorithm to model the dynamics of the optimal water reservoir control problem, with the aim of recovering a reward function that best explains the observed behavior of reservoir operation from historical expert demonstrations. Specifically, we employ a Deep Maximum Entropy Inverse Reinforcement Learning (Deep MaxEnt IRL) model, in which we use a deep neural network to model the relationship between state-action pairs and the reward and rely on the maximum entropy principle to update neural network parameters. We then use a Boltzmann soft-max optimality update to derive the optimal control policy under the recovered reward function. Compared to existing implementations, our algorithm is capable of recovering the entire reward function mapping instead of only specific parameters, such as weights, making the algorithm deployable to different reservoirs without needing any a priori knowledge. We demonstrate the potential of the proposed algorithm by running a Monte Carlo simulation based on the optimal control policy guided by the recovered reward function on the Grand Coulee Dam in Washington State. We observed a close match between the simulated and actual release on unseen test period data, showcasing our algorithm’s ability to encapsulate the observed system dynamics.
To run the code, please use the included Jupyter Notebook. The notebook should be well-commented and self-explainatory. For questions, please do not hesitate to contact Jerry at zf245@cornell.edu.