Abstract

Model-based reinforcement learning is expected to be a method that can safely acquire the optimal policy under real-world conditions by using a stochastic dynamics model for planning. Since the stochastic dynamics model of the real world is generally unknown, a method for learning from state transition data is necessary. However, model learning suffers from the problem of bias-variance trade-off. Conventional model learning can be formulated as a minimization problem of expected loss. Failure to consider higher-order statistics for loss would lead to fatal errors in long-term model prediction. Although various methods have been proposed to explicitly handle bias and variance, this paper first formulates a new loss function, especially for sequential training of the deep neural networks. To explicitly consider the bias-variance trade-off, a new multi-objective optimization problem with the augmented weighted Tchebycheff scalarization, is proposed. In this problem, the bias-variance trade-off can be balanced by adjusting a weight hyperparameter, although its optimal value is task-dependent and unknown. We additionally propose a general-purpose and efficient meta-optimization method for hyperparameter(s). According to the validation result on each epoch, the proposed meta-optimization can adjust the hyperparameter(s) towards the preferred solution simultaneously with model learning. In our case, the proposed meta-optimization enables the bias-variance trade-off to be balanced for maximizing the long-term prediction ability. Actually, the proposed method was applied to two simulation environments with uncertainty, and the numerical results showed that the well-balanced bias and variance of the stochastic model suitable for the long-term prediction can be achieved.

Highlights

  • Reinforcement learning (RL) [1] is one of the promising methods for robots to adaptively acquire their own policies in the real world

  • This paper proposed a stochastic model learning method that is adjustable the bias-variance trade-off of the stochastic model according to higher-level objective

  • The proposed method consists of the loss function derived from the twostep multi-objective optimization (MOO) problem with inter-data and statistic-perspective objectives, and the meta-optimization of the hyperparame

Read more

Summary

INTRODUCTION

Reinforcement learning (RL) [1] is one of the promising methods for robots to adaptively acquire their own policies in the real world. The contributions in this paper are three folds: 1) Formulation of the bias-variance trade-off as a MOO problem 2) Development of a general-purpose and efficient metaoptimization method 3) Numerical verification of the proposed formulation with the meta-optimization on two simulations for the environments with uncertainty due to human operation and presence of other agents. According to a user-desired (highlevel) meta-objective (e.g. generalizing across different tasks and long-term prediction accuracy like our setting), metaoptimization methods aim to optimize hyperparameters in the learning algorithm and/or the low-level loss function. Minimization of the low-level loss function is generally with high computational cost due to large dataset for training DNNs. The meta-optimization methods should be, highly efficient. C: Arbitrariness of target When dealing with MOO problems such as the bias-variance trade-off, the meta-objective for selecting one of the Pareto solution sets cannot be assumed in advance. The meta-objective only needs to be given a numerical scalar value as an evaluation of the low-level learners, and there is no need to assume either type or differentiability

STOCHASTIC MODEL LEARNING IN MARKOV
3) Summary of proposed losses
META-OPTIMIZATION OF HYPERPARAMETER
META-OBJECTIVE
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.