Abstract

Nowadays, people developed various convolutional neural network (CNN) based models for computer vision. Some famous models, such as GoogLeNet, Residual Network (ResNet), Visual Geometry Group (VGG), and You Only Look Once (YOLO), have different architecture and performances. Determining which model to use may be a troublesome problem for those just starting to study image classification. To solve this problem, we introduce the GoogLeNet, ResNet-18, and VGG-16 models, comparing their architecture, features, and performance. Then we give our suggestions based on the test results to help beginners choose a suitable model. We conducted experiments to train and test GoogLeNet, ResNet-18, and VGG-16 on the Cifar-100 datasets with the same hyperparameters. Based on the test results (test accuracy, average test loss, training loss), we analyze the figures for trends, key points, increase rate, and other features. Then we combine the architecture of each model to make our conclusions. The experimental results show that ResNet-18 can be a good choice when training the model with the Cifar-100 datasets because it performs well after training and has a low time complexity. ResNet-18 also has the fastest convergence speed. GoogLeNet would be the second choice because it functions similarly to ResNet-18 and is even better. However, training GoogLeNet is a time-consuming task. VGG is not recommended in this experiment because it has the worst performance and similar training complexity compared with ResNet-18.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call