Abstract

In this paper, a keyword detection scheme is proposed based on deep convolutional neural networks for personal information protection in document images. The proposed scheme is composed of key character detection and lexicon analysis. The first part is the key character detection developed based on RetinaNet and transfer learning. To find the key characters, RetinaNet, which is composed of convolutional layers featuring a pyramid network and two subnets, is exploited to detect key characters within the region of interest in a document image. After the key character detection, the second part is a lexicon analysis, which analyzes and combines several key characters to find the keywords. To train the model of RetinaNet, synthetic image generation and data augmentation are exploited to yield a large image dataset. To evaluate the proposed scheme, many document images are selected for testing, and two performance measurements, IoU (Intersection Over Union) and mAP (Mean Average Precision), are used in this paper. Experimental results show that the mAP rates of the proposed scheme are 85.1% and 85.84% for key character detection and keyword detection, respectively. Furthermore, the proposed scheme is superior to Tesseract OCR (Optical Character Recognition) software for detecting the key characters in document images. The experimental results demonstrate that the proposed method can effectively localize and recognize these keywords within noisy document images with Mandarin Chinese words.

Highlights

  • There is a lot of information from paper documents for human communication

  • The experimental results demonstrate that the proposed scheme is superior to Tesseract OCR software for detecting key characters in noisy document images

  • A keyword detection scheme was proposed based on deep convolutional neural networks for personal information protection in document images

Read more

Summary

Introduction

There is a lot of information from paper documents for human communication. Paper documents often contain typical elements such as text, tables, stamps, and signatures. Compared with traditional machine-learning methods developed based on handcrafted features, Deep Neural Networks (DNN) [7,8] have received more and more attention due to their excellent performance in image classification, speech recognition, fraud detection, and so on. [16], a fast CNN-based method is proposed to automatically perform layout analysis for document images. In the existing method [16], a document image is segmented into some blocks, and these blocks are classified into three categories, i.e., text, table, and image, based on a CNN. Some Mandarin Chinese words such as “業主” (Property Owner) and “起造人姓名” (Name of Applicant) can be used as the special information to distinguish Figure 1a from Figure 1b.

System Description
ROI Localization
CNN-Based Key Character Detection
RetinaNet Architecture
Model Training Procedure
Lexicon Analysis
Experimental Results Determine whether another key character
Keyword Detection
Comparison with Tesseract for Key Character Recognition
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.