Abstract. With the development of science and technology, people are using machine learning as a method to make life easier. However, the performance corresponding to different models will produce different predictions and results for one thing. In order to improve the performance of the model, this paper investigates the influence of the learning rate hyperparameter on the model's performance, and demonstrates it through the convergence of the loss function. After experimental research, it has been found that different models exhibit significant differences in their performance when processing the same dataset. Meanwhile, different learning rates also have a significant impact on the performance of the model. Therefore, after selecting the correct model for machine learning, one should also adjust a relatively good hyperparameter to make the entire process smoother. Based on the analysis, one will gain a basic understanding of the optimal learning rates for transformer, diffusion, and RNN models when training MNIST. It is convenient for people to set better hyperparameters and obtain better prediction and decision-making results when using these three models, so that one can demonstrate better performance when using these three models.
Read full abstract