Scene texts are the type of texts that we see in our surroundings when we are outside. They have custom fonts, sizes, shapes and colors that are often different from the fonts or styles typically found in documents. Scene text recognition has grown in popularity due to its significance in the development of smarter automated systems. Despite the fact that numerous types of research on scene text detection and recognition have been conducted in different languages, similar work on the Bengali language is limited by the lack of large-scale usable datasets. In this paper, we propose a multipurpose system for Bengali Scene text that allows for easy dataset collection via crowdsourcing as well as annotation, classification and detection tasks to be performed on the same platform. It comes with an Android-based application that can be used to capture or load images from the device, select Bengali texts over the images and annotate them. The images and labels are stored in the cloud. A Python-based script is used to perform real-time data processing, analysis, and detection tasks. The proposed system has also been tested on a crowdsourced Bengali scene text dataset collected by this system. The classification model achieved an accuracy score of 97%, and the detection model achieved a Mean Average Precision score of 92 on the on these crowd contributed dataset.
Read full abstract