The optimization of machine learning models is an open research question since an optimal procedure to change the learning rate throughout training is still unknown. Manually defining a learning rate schedule involves troublesome, time-consuming try-and-error procedures to determine hyperparameters, such as learning rate decay epochs and learning rate decay rates, in order to navigate the intricate landscape of loss functions. Although adaptive learning rate optimizers automatize this process, recent studies suggest they may produce overfitting and reduce performance compared to fine-tuned learning rate schedules. Considering that the new machine learning loss function approaches present landscapes with much more saddle points than local minima, we proposed the Training Aware Sigmoidal Optimizer (TASO). This automated two-phase learning rate adaptation mechanism significantly reduces the need for manual hyperparameter tuning. The first phase uses a high learning rate to quickly traverse the numerous saddle points in the error surface, while the second phase uses a low learning rate to gradually approach the center of the local minimum previously found. We compared the proposed approach with commonly used adaptive learning rate schedules such as Adam, RMSProp, and Adagrad. The validation experiments were performed in image and text datasets and showed that TASO outperformed all competing methods in optimal (i.e., performing hyperparameter validation) and suboptimal (i.e., using default hyperparameters) scenarios. In our benchmark tests, TASO demonstrated promising performance, achieving an 8.32% increase in accuracy and a significant 46.62% decrease in training loss on average across various datasets and models. This remarkable performance positions TASO ahead of well-established adaptive optimizers, suggesting higher effectiveness and consistency in performance.
Read full abstract