Abstract

Scene text detection and recognition are vital challenging tasks in computer vision, which are to detect and recognize sequences of texts in natural scenes. Recently, researchers have investigated a lot of state-of-the-art methods to improve the accuracy and efficiency of text detection and recognition. However, there has been little research on text detection and recognition in natural scenes in Vietnam. In this paper, a deep learning-based empirical investigation of Vietnamese scene text detection and recognition is presented. Firstly, four detection models including differentiable binarization network (DBN), pyramid mask text detector (PMTD), pixel aggregation network (PAN), and Fourier contour embedding network (FCEN), are employed to detect text regions from the images. Then, four text recognition models including convolutional recurrent neural network (CRNN), self-attention text recognition network (SATRN), no-recurrence sequence-to-sequence text recognizer (NRTR), and RobustScanner (RS) are also investigated to recognize the texts. Moreover, data augmentation methods are also applied to enrich data for improving the accuracy and enhancing the performance of scene text detection and recognition. To validate the effectiveness of scene text detection and recognition models, the VinText dataset is employed for evaluation. Empirical results show that PMTD and SATRN achieve the highest scores among the others for text detection and recognition, respectively. For knowledge-sharing, our implementation is publicly available at https://github.com/ThorPham/VN_scene_text_detection_recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call