Abstract

Abstract In this paper, the difficult problem of character recognition in natural scenes caused by many factors such as variability of light in the natural scene, background clutter and inaccurate viewing angle, and inconsistent resolution. Based on the deep learning framework PyTorch, a convolutional neural network is implemented. Based on the classic LeNet-5 network, the network optimizes the input layer to accept three-channel images, changes the pooling method to maximum pooling to simplify parameters, and the activation function is replaced by Rectified Linear Unit with faster convergence. The cross-entropy loss is used instead of the minimum mean square error to mitigate the slow learning. Furthermore, we also enroll the gradient descent optimization algorithm RMSprop and L2 regularization to improve the accuracy, speed up the convergence and suppress the over-fitting. The experiment results show that our model achieved an accuracy of 92.32% after training for 7h24min on the street view house number(SVHN) dataset, effectively improving the performance of LeNet-5.

Highlights

  • The traditional method of classifying house numbers from natural scene images is usually to use manual feature extraction[1,2] and template matching[3,4]

  • In order to identify the house number of the corridor environment, Zhang Shuai et al used the combination of Robert edge detection and morphological operation to locate the position of the house number image, and divide the house number by horizontal and vertical projection method, tilt correction, etc., and use pattern recognition to identify the house number [5]

  • The training process requires a large amount of data compared with the traditional method, the convolutional neural network can automatically summarize the target feature from these data

Read more

Summary

INTRODUCTION

The traditional method of classifying house numbers from natural scene images is usually to use manual feature extraction[1,2] and template matching[3,4]. Ma Liling et al used the linear discriminant linear local tangent space alignment algorithm (ODLLTSA) and the support vector machine (SVM) method to identify the house number, use the extracted features to train the SVM classifier, and use the SVM classifier to the new house number classification [6]. Overcome the shortcomings of manual design features that are timeconsuming and labor-intensive, have poor general use and require high experience in the designer field It is precise because of these advantages of convolutional neural networks that a large number of researchers have begun to apply it to solve character recognition problems. In response to this situation, we implemented a LeNet-5-based neural network based on the deep learning framework PyTorch and achieved an accuracy of 92.32% on the SVHN dataset at a time of 6 hours and 17 minutes

Network structure
Comparison effects of different optimizers
EXPERIMENT AND ANALYSIS
Data augmentation
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.