Abstract
Optimization problems have a very important leading position in machine learning. A great deal of machine learning algorithms ends up solving optimization problems. Among all the optimization algorithms, gradient methods are the simplest and most commonly used compared to algorithms like Particle Swarm Optimization and Ant Colony Optimization. In the gradient methods, Adaptive Moment Estimation (Adam) and stochastic parallel gradient descent (SGD) are both outstanding algorithms that have helped solve all kinds of deep learning tasks. But which one is better in some certain conditions is still unknown, which means programmers need to try many of the optimizers to have the best choice. Based on some previous researches, this paper study the impact of L2 regularization and weight decay in Adam and SGD with momentum, which turns out in adaptive methods, L2 regularization is not as effective as it is in SGD. It gives the intuition that SGD should outperform Adam in image classification tasks. However, this paper finds things go the other way around by running an experiment using Lenet-5 on MINST. Besides, this paper describes an experiment on Fashion-MINST using DCGAN with both Adam and SGD as optimizers in Generator and Discriminator. The result shows that the generator with SGD produces fake images with higher quality.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.