Abstract

Even nowadays, where Deep Learning (DL) has achieved state-of-the-art performance in a wide range of research domains, accelerating training and building robust DL models remains a challenging task. To this end, generations of researchers have pursued developing robust methods for training DL architectures that can be less sensitive to weight distributions, model architectures, and loss landscapes. However, such methods are limited to adaptive learning rate optimizers, initialization schemes, and clipping gradients without investigating the fundamental rule of parameters update. Although multiplicative updates have contributed significantly to the early development of machine learning and hold strong theoretical claims, to the best of our knowledge, this is the first work that investigates them in a unified manner in the context of DL optimization. In this work, we propose an optimization framework that fits a wide range of optimization algorithms and enables one to apply alternative update rules. To this end, we propose a novel multiplicative update rule and we extend their capabilities by combining it with a traditional additive update term, under a novel hybrid update method. We claim that the proposed framework accelerates training while leading to more robust models in contrast to traditionally used additive update rule, and we experimentally demonstrate its effectiveness in a wide range of task and optimization methods. Such tasks range from convex and non-convex optimization to difficult image classification benchmarks applying a wide range of traditionally used optimization methods and Deep Neural Network (DNN) architectures, providing quantitative and qualitative experimental results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call