Optical Character Recognition (OCR) poses a crucial challenge within the realm of computer vision research, as it plays a pivotal role in converting vast amounts of unstructured text data into structured formats to support diverse artificial intelligence applications. The OCR process encompasses two core components: text detection and text recognition. Text detection involves identifying and extracting text regions, achieved through either object detection or segmentation techniques, while text recognition focuses on accurately deciphering the content within these identified regions. In recent years, remarkable strides have been made in the domain of text recognition, primarily driven by deep learning-based models. These models eliminate the need for manual feature processing and excel in recognizing text even within complex scenes, surpassing the performance of traditional text recognition methods and subsequently emerging as the dominant approach. The objective of this paper is to present a comprehensive survey of both text detection and text recognition models. Firstly, we systematically categorize and provide an overview of existing off-the-shelf text detection methods. Subsequently, we conduct an in-depth investigation of six distinct text recognition models, taking into account their unique implementations. Additionally, we explore and analyze the principal datasets that currently prevail in the field of text detection and recognition. Furthermore, this research entails a meticulous performance comparison of various text detection algorithms on the CTW1500, TotalText, and ICDAR2015 datasets. Additionally, we evaluate and scrutinize the efficacy of mainstream text recognition algorithms on the IIIT-5K, SVT, ICDAR2013, SVT-P, CUTE80, and ICDAR2015 datasets. Finally, we conclude with a discussion on the future development and research trends concerning text detection and recognition, providing insights that can further drive progress in this crucial area.
Read full abstract