Meta-Optimization of Bias-Variance Trade-Off in Stochastic Model Learning

Takumi Aotani,Taisuke Kobayashi,Kenji Sugimoto

doi:10.1109/access.2021.3125000

Takumi Aotani, Taisuke Kobayashi + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3125000

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 10	License type: CC BY 4.0

Affiliation: Nara Institute of Science and Technology

Abstract

Model-based reinforcement learning is expected to be a method that can safely acquire the optimal policy under real-world conditions by using a stochastic dynamics model for planning. Since the stochastic dynamics model of the real world is generally unknown, a method for learning from state transition data is necessary. However, model learning suffers from the problem of bias-variance trade-off. Conventional model learning can be formulated as a minimization problem of expected loss. Failure to consider higher-order statistics for loss would lead to fatal errors in long-term model prediction. Although various methods have been proposed to explicitly handle bias and variance, this paper first formulates a new loss function, especially for sequential training of the deep neural networks. To explicitly consider the bias-variance trade-off, a new multi-objective optimization problem with the augmented weighted Tchebycheff scalarization, is proposed. In this problem, the bias-variance trade-off can be balanced by adjusting a weight hyperparameter, although its optimal value is task-dependent and unknown. We additionally propose a general-purpose and efficient meta-optimization method for hyperparameter(s). According to the validation result on each epoch, the proposed meta-optimization can adjust the hyperparameter(s) towards the preferred solution simultaneously with model learning. In our case, the proposed meta-optimization enables the bias-variance trade-off to be balanced for maximizing the long-term prediction ability. Actually, the proposed method was applied to two simulation environments with uncertainty, and the numerical results showed that the well-balanced bias and variance of the stochastic model suitable for the long-term prediction can be achieved.

Highlights

Reinforcement learning (RL) [1] is one of the promising methods for robots to adaptively acquire their own policies in the real world
This paper proposed a stochastic model learning method that is adjustable the bias-variance trade-off of the stochastic model according to higher-level objective
The proposed method consists of the loss function derived from the twostep multi-objective optimization (MOO) problem with inter-data and statistic-perspective objectives, and the meta-optimization of the hyperparame

Summary

INTRODUCTION

Reinforcement learning (RL) [1] is one of the promising methods for robots to adaptively acquire their own policies in the real world. The contributions in this paper are three folds: 1) Formulation of the bias-variance trade-off as a MOO problem 2) Development of a general-purpose and efficient metaoptimization method 3) Numerical verification of the proposed formulation with the meta-optimization on two simulations for the environments with uncertainty due to human operation and presence of other agents. According to a user-desired (highlevel) meta-objective (e.g. generalizing across different tasks and long-term prediction accuracy like our setting), metaoptimization methods aim to optimize hyperparameters in the learning algorithm and/or the low-level loss function. Minimization of the low-level loss function is generally with high computational cost due to large dataset for training DNNs. The meta-optimization methods should be, highly efficient. C: Arbitrariness of target When dealing with MOO problems such as the bias-variance trade-off, the meta-objective for selecting one of the Pareto solution sets cannot be assumed in advance. The meta-objective only needs to be given a numerical scalar value as an evaluation of the low-level learners, and there is no need to assume either type or differentiability

STOCHASTIC MODEL LEARNING IN MARKOV

3) Summary of proposed losses

META-OPTIMIZATION OF HYPERPARAMETER

META-OBJECTIVE

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Meta-Optimization of Bias-Variance Trade-Off in Stochastic Model Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A Stochastic Model for the Long-Term Prediction
Akiji Shinkai ... Shuntao Wan
Journal of the Society of Naval Architects of Japan | VOL. 1996
Akiji Shinkai, et. al.Akiji Shinkai ... Shuntao Wan
01 Jan 1996
Journal of the Society of Naval Architects of Japan | VOL. 1996

РЕГИОНАЛЬНАЯ ДИНАМИЧЕСКАЯ СТОХАСТИЧЕСКАЯ МОДЕЛЬ ОБЩЕГО РАВНОВЕСИЯ КАК ИНСТРУМЕНТ АНАЛИЗА ФИСКАЛЬНОЙ ПОЛИТИКИ
Leonid Aleksandrovich Serkov
Вестник Пермского университета Серия «Экономика» = Perm University Herald ECONOMY | VOL. 14
Leonid Aleksandrovich SerkovLeonid Aleksandrovich Serkov
01 Jan 2019
Вестник Пермского университета Серия «Экономика» = Perm University Herald ECONOMY | VOL. 14

An exploratory assessment of the Bering-Chukchi-Beaufort Seas stock of bowhead whales using a stochastic population dynamics model
David Poole ... Geof H Givens
J. Cetacean Res. Manage. | VOL. 3
David Poole, et. al.David Poole ... Geof H Givens
25 May 2023
J. Cetacean Res. Manage. | VOL. 3

Linearization threshold condition and stability analysis of a stochastic dynamic model of one-machine infinite-bus (OMIB) power systems
Lijuan Li ... Bin Zhou
Protection and Control of Modern Power Systems | VOL. 6
Lijuan Li, et. al.Lijuan Li ... Bin Zhou
21 Jun 2021
Protection and Control of Modern Power Systems | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Meta-Optimization of Bias-Variance Trade-Off in Stochastic Model Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access