Automatic Privacy Detection in Scanned Document Images Based on Deep Neural Networks

Lyudmila Kopeykina,Andrey V Savchenko

doi:10.1109/rusautocon.2019.8867614

Abstract

The authors consider the problem of automatic detection of private scanned documents based on text recognition with deep neural networks. The paper suggests implementing a two-phase approach with the first stage which includes efficient EAST text detection and recognition using Tesseract OCR Engine. Secondly, the authors classify the privacy of a scanned document by deep neural networks applied to the extracted text. After that, a special dataset is gathered in order to train these networks. The experiments show that using OCR Engine for both text detection and segmentation ends up with relatively poor identification of private documents when compared to preliminary text detection with EAST method. Moreover, conventional keyword spotting using the list of sensitive words is less accurate when compared to neural network-based methods. Finally, it was demonstrated that the classification of a bag of most frequent words outperforms traditional text classification techniques with LSTM and convolutional networks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Privacy Detection in Scanned Document Images Based on Deep Neural Networks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Improvement of the end-to-end scene text recognition method for “text-to-speech” conversion
Fazliddin Makhmudov ... Mukhriddin Mukhiddinov
International Journal of Wavelets, Multiresolution and Information Processing | VOL. 18
Fazliddin Makhmudov, et. al.Fazliddin Makhmudov ... Mukhriddin Mukhiddinov
15 Sep 2020
International Journal of Wavelets, Multiresolution and Information Processing | VOL. 18

Cursive Text Recognition in Natural Scene Images Using Deep Convolutional Recurrent Neural Network
Asghar Ali Chandio ... Mark R Pickering
IEEE Access | VOL. 10
Asghar Ali Chandio, et. al.Asghar Ali Chandio ... Mark R Pickering
01 Jan 2021
IEEE Access | VOL. 10

Implementation of OCR using Convolutional Neural Network (CNN): A Survey
Ahmed Alkaddo ... Dujan Albaqal
Journal of Education and Science | VOL. 31
Ahmed Alkaddo, et. al.Ahmed Alkaddo ... Dujan Albaqal
01 Sep 2022
Journal of Education and Science | VOL. 31

Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.
Asghar Ali Chandio ... Mehwish Leghari
Data in Brief | VOL. 31
Asghar Ali Chandio, et. al.Asghar Ali Chandio ... Mehwish Leghari
21 May 2020
Data in Brief | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Privacy Detection in Scanned Document Images Based on Deep Neural Networks

Abstract

Talk to us

Similar Papers