Abstract

This paper proposes an enhancement of an automatic text recognition system for extracting information from the front side of the Vietnamese citizen identity (CID) card. First, we apply Mask-RCNN to segment and align the CID card from the background. Next, we present two approaches to detect the CID card’s text lines using traditional image processing techniques compared to the EAST detector. Finally, we introduce a new end-to-end Convolutional Recurrent Neural Network (CRNN) model based on a combination of Connectionist Temporal Classification (CTC) and attention mechanism for Vietnamese text recognition by jointly train the CTC and attention objective functions together. The length of the CTC’s output label sequence is applied to the attention-based decoder prediction to make the final label sequence. This process helps to decrease irregular alignments and speed up the label sequence estimation during training and inference, instead of only relying on a data-driven attention-based encoder-decoder to estimate the label sequence in long sentences. We may directly learn the proposed model from a sequence of words without detailed annotations. We evaluate the proposed system using a real collected Vietnamese CID card dataset and find that our method provides a 4.28% in WER and outperforms the common techniques.

Highlights

  • In computer vision, scene text recognition (STR) related to analyzing and understanding high-level semantic information from texts in the image is challenging

  • 3.3.3 Experimental Results on Text Recognition We first explored the effects of the Convolutional Recurrent Neural Network (CRNN) based attention mechanism on manually cropped text line images from the citizen identity (CID) card images

  • Tab. 4 shows the performance comparison in word error rate (WER) of the CRNN model based on the Connectionist Temporal Classification (CTC) and attention

Read more

Summary

Introduction

Scene text recognition (STR) related to analyzing and understanding high-level semantic information from texts in the image is challenging. The STR systems have been extensively and successfully utilized in various applications, such as image retrieval, driver-assisted systems, recognition of personal cards and related documents, etc. The STR systems often include two main procedures: Text detection and recognition. Text detection aims to determine the location of texts from the image. Text recognition is applied to identify the texts and generate a sequence of texts from the text images. Text recognition has been successfully used to extract information from the identity card (ID card) for different languages such as English, Chinese, Vietnamese, Spanish, etc. This study focuses on developing an approach to recognize Vietnamese text and an application to extract information from Vietnamese citizen

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.