Abstract
Optimizing learning rates (LRs) in deep learning (DL) has long been challenging. Previous solutions, such as learning rate scheduling (LRS) and adaptive learning rate (ALR) algorithms like RMSProp and Adam, added complexity by introducing new hyperparameters, thereby increasing the cost of model training through expensive cross-validation experiments. These methods mainly focus on local gradient patterns, which may not be effective in scenarios with multiple local optima near the global optimum. A new technique called Learning Rate Tuner with Relative Adaptation (LRT-RA) is introduced to tackle these issues. This approach dynamically adjusts LRs during training by analyzing the global loss curve, eliminating the need for costly initial LR estimation through cross-validation. This method reduces training expenses and carbon footprint and enhances training efficiency. It demonstrates promising results in preventing premature convergence, exhibiting inherent optimization behavior, and elucidating the correlation between dataset distribution and optimal LR selection. The proposed method achieves 84.96% accuracy on the CIFAR-10 dataset while reducing the power usage to 0.07 kWh, CO2 emissions to 0.05, and both SO2 and NOx emissions to 0.00003 pounds, during the whole training and testing process.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have