Abstract

The process of machine learning is to find parameters that minimize the cost function constructed by learning the data. This is called optimization and the parameters at that time are called the optimal parameters in neural networks. In the process of finding the optimization, there were attempts to solve the symmetric optimization or initialize the parameters symmetrically. Furthermore, in order to obtain the optimal parameters, the existing methods have used methods in which the learning rate is decreased over the iteration time or is changed according to a certain ratio. These methods are a monotonically decreasing method at a constant rate according to the iteration time. Our idea is to make the learning rate changeable unlike the monotonically decreasing method. We introduce a method to find the optimal parameters which adaptively changes the learning rate according to the value of the cost function. Therefore, when the cost function is optimized, the learning is complete and the optimal parameters are obtained. This paper proves that the method ensures convergence to the optimal parameters. This means that our method achieves a minimum of the cost function (or effective learning). Numerical experiments demonstrate that learning is good effective when using the proposed learning rate schedule in various situations.

Highlights

  • Machine learning is carried out by using a cost function to determine how accurately a model learns from data and determining the parameters that minimize this cost function

  • This paper proves its convergence when used with the Adam method

  • In order to solve this problems, the existing methods used a monotonically decreasing learning rate at a constant rate according to the iteration time

Read more

Summary

Introduction

Machine learning is carried out by using a cost function to determine how accurately a model learns from data and determining the parameters that minimize this cost function. With larger sets of training data and more complex training models, the cost function many have many local minima, and the simple gradient descent method fails at a local minimum because the gradient vanishes at this point To solve this problem, some gradient-based methods where learning occurs even when the gradient is zero have been introduced, such as momentum-based methods. Since the learning rate set initially is a constant, the gradient may not be scaled during learning To solve this problem, other methods have been developed to schedule the learning rate, such as step-based and time-based methods, where it is not a constant but a function which becomes smaller as learning progresses [11,12,13,14]. Weber’s function experiments, to test for changes in multidimensional space and local minima, binary classification experiments, and classification experiments with several classes [28]

Machine Learning Method
Direction Method
Gradient Descent Method
Momentum Method
Learning Rate Schedule
Time-Based Learning Rate Schedule
Step-Based Learning Rate Schedule
Exponential-Based Learning Rate Schedule
Adaptive Optimization Methods
The Proposed Method
Numerical Tests
Two-Variable Function Test Using Weber’s Function
Case 1
Case 2
MNIST with MLP
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call