Abstract
Approximate dynamic programming (ADP) is the standard tool for the solution of multistage dynamic optimization problems under general conditions, such as nonlinear state equation and cost, and continuous state and control spaces. In the typical ADP implementation, the value function is approximated by means of a single model trained over a suitable sampling of the state space. In this paper we investigate the ensemble learning paradigm in the ADP context, which consists in exploiting the outputs of many models trained for the value function approximation. To this purpose, we introduce an optimization scheme for the aggregation of the ensemble outputs, related to the supremum norm error on which the ADP accuracy depends. Furthermore, we show that the ensemble of value function approximations can be used to identify a-priori good state points used to train the approximating models, exploiting an ambiguity-like term tailored to the proposed ensemble optimization scheme. The advantages of ensembles in ADP are showcased both through error analysis and a simulation campaign involving various test problems. Our results show how ensembles obtained through the proposed output weights optimization scheme yield more accurate and robust value function approximations with respect to single elements. At the same time, we show how the ensembles can successfully be employed to select good state samples to be employed as training set for the value function approximations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.