Abstract

Retrieving text embedded within images is a challenging task in real-world settings. Multiple problems such as low-resolution and the orientation of the text can hinder the extraction of information. These problems are common in environments such as Tor Darknet and Child Sexual Abuse images, where text extraction is crucial in the prevention of illegal activities. In this work, we evaluate eight text recognizers and, to increase the performance of text transcription, we combine these recognizers with rectification networks and super-resolution algorithms. We test our approach on four state-of-the-art and two custom datasets (TOICO-1K and Child Sexual Abuse (CSA)-text, based on text retrieved from Tor Darknet and Child Sexual Exploitation Material, respectively). We obtained a 0.3170 score of correctly recognized words in the TOICO-1K dataset when we combined Deep Convolutional Neural Networks (CNN) and rectification-based recognizers. For the CSA-text dataset, applying resolution enhancements achieved a final score of 0.6960. The highest performance increase was achieved on the ICDAR 2015 dataset, with an improvement of 4.83% when combining the MORAN recognizer and the Residual Dense resolution approach. We conclude that rectification outperforms super-resolution when applied separately, while their combination achieves the best average improvements in the chosen datasets.

Highlights

  • The automatic detection, segmentation and recognition of text in natural images, known as text spotting, is a challenging task with multiple practical applications [1,2,3]

  • We address the problem of performing text recognition on non-horizontal and low-resolution text [25], by enhancing images using two different techniques; rectification networks [26,27], which correct an image’s orientation to reduce transcription mismatches, and super-resolution techniques, which improve the image quality before recognition

  • We address the problem of improving the performance of text recognition for forensic applications assessing the use of rectification networks together with super-resolution techniques

Read more

Summary

Introduction

The automatic detection, segmentation and recognition of text in natural images, known as text spotting, is a challenging task with multiple practical applications [1,2,3]. Specialized analysts in forensic laboratories can recognize multiple objects and text in an image with little or no conscious effort, this manual analysis becomes unfeasible within the proposed time constraints of most investigations [6]. The development and implementation of fast, automatic and efficient tools for the analysis of images and videos become crucial for the forensic field [4,5]. Multiple objects can be detected and classified within an image with high performance

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.