In the rapidly advancing field of artificial intelligence, Convolutional Neural Networks (CNNs), as a representative method, have risen to prominence as a pivotal instrument for handling visual data. However, despite their widespread use, the impact of CNN depth on performance remains under-explored. This study delves into this aspect, evaluating the performance of CNN architectures with different depths - two-layer, four-layer, and five-layer - on the MNIST dataset, a version from the National Institute of Standards and Technology, a well-known benchmark dataset for handwritten digit recognition. Experimental results reveal that the four-layer model achieved the highest average accuracy of 99.76%, while the five-layer model, despite its additional complexity, only slightly trailed behind with a 99.73% accuracy rate. However, the five-layer model required a significantly longer training time. In conclusion, while deeper networks can increase accuracy, they can also introduce computational inefficiencies without significant gains in performance. This research provides a better understanding of CNN depth, guiding optimal model selection for image classification tasks.
Read full abstract