Abstract

In contemporary machine learning, training datasets are typically divided into batches, and models are updated incrementally through batch iterations to save memory and reduce overfitting. However, determining the optimal hyperparameters like learning rate, batch size and number of epochs remains a challenge which often relying on empirical insights. This paper explores a novel method called Adaptive Gradient Conflict Rate (AdaGCR) to optimize the training process. It leverages the idea of gradient conflict rate, which reflects the models position within a batch model set and accordingly adjusts the global learning rate. This proposed method is tested by training a Deep Neural Network (DNN) model with MNIST dataset which represents simple tasks and a ResNet-18 model with CIFAR-10 dataset which represents more complicated tasks for solving real world problems. Experiments conducted on DNN demonstrates the proposed methods effectiveness in reducing overfitting and enhancing convergence, particularly with a well-suited initial learning rate. However, its applicability to more complex models like ResNet-18 may require further refinements, such as layer-specific learning rate adjustments. Future research should focus on fine-tuning AdaGCR and extending its utility across diverse machine learning models and tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call