Computer Vision Based Automatic Margin Computation Model for Digital Document Images
Margin, in typography, is described as the space between the text content and the document edges and is often essential information for the consumer of the document, digital or physical. In the present age of digital disruption, it is customary to store and retrieve documents digitally and retrieve information automatically from the documents when necessary. Margin is one such non-textual information that becomes important for some business processes, and the demand for computing margins algorithmically mounts to facilitate RPA. We propose a computer vision-based text localization model, utilizing classical DIP techniques such as smoothing, thresholding, and morphological transformation to programmatically compute the top, left, right, and bottom margins within a digital document image. The proposed model has been experimented with different noise filters and structural elements of various shapes and size to finalize the bilateral filter and lines and structural elements for the removal of noises most commonly occurring due to scans. The proposed model is targeted towards text document images and not the natural scene images. Hence, the existing benchmark models developed for text localization in natural scene images have not performed with the expected accuracy. The model is validated with 485 document images of a real-time business process of a reputed TI company. The results show that 91.34% of the document images have conferred more than 90% IoU value which is well beyond the accuracy range determined by the company for that specific process.
1375
- 10.1109/icdar.2015.7333942
- Aug 1, 2015
18
- 10.1016/j.neucom.2016.07.016
- Jul 21, 2016
- Neurocomputing
311
- 10.1371/journal.pone.0029740
- Jan 10, 2012
- PLoS ONE
41
- 10.1101/sqb.1985.050.01.085
- Jan 1, 1985
- Cold Spring Harbor Symposia on Quantitative Biology
24
- 10.1016/j.procs.2015.03.147
- Jan 1, 2015
- Procedia Computer Science
1335
- 10.1109/tmm.2018.2818020
- Nov 1, 2018
- IEEE Transactions on Multimedia
22
- 10.1109/icdarw.2019.40077
- Sep 1, 2019
56
- 10.1109/das.2012.6
- Mar 1, 2012
45
- 10.1109/icdar.2011.44
- Sep 1, 2011
- Proceedings of the ... International Conference on Document Analysis and Recognition. International Conference on Document Analysis and Recognition
60
- 10.1007/s11760-015-0828-7
- Oct 29, 2015
- Signal, Image and Video Processing
- Book Chapter
- 10.4018/978-1-6684-9809-5.ch016
- Jun 30, 2023
It's crucial to thoroughly research the prior history of the cryptocurrency market, in this case Dogecoin, before considering investing in the financial market, especially the cryptocurrency market. The authors want to create a Python project that forecasts Dogecoin's price. To accomplish this, researchers need to gather all available data on Dogecoin's price history and use it to create mathematical formulas that will decide the currency's pricing. In order to help people grasp this data, the authors also create a chart to display it all.
- Book Chapter
- 10.4018/978-1-6684-9809-5.ch015
- Jun 30, 2023
A relational database schema was designed for the Kosovo Hospital Management System. The objective was to create a platform to store patient information, appointments, billing, and feedback, with the main goal of improving patient care. Eleven unique tables were created, with primary and foreign keys defining the relationships between them. Through the tables it is possible the management of patient and doctor schedules, medical records, allergies, blood donations, medical equipment, and hospital department administration, among others. The management system benefits all healthcare institutions such as hospitals, clinics, and blood banks, as well as medical staff such as doctors and nurses. Patients can also benefit from this by providing better healthcare services, faster appointment scheduling, and more.
- Book Chapter
- 10.4018/978-1-6684-9809-5.ch009
- Jun 30, 2023
We might become interested in stock market investment during the course of our lives. But to do that, we first need to have a clear understanding and analysis of the market. We may be unsure of whether to buy or sell stocks; thus, this is vital. It can be challenging to determine whether a stock will rise or decline in value or whether it will be a successful investment. In the event that the stock decreases, we can be unsure about whether to sell or stay on to it. Given the large number of investors who purchase stocks globally and the potential for significant losses, stock market analysis is crucial. The goal is to develop a project that analyzes the stock market and aids in our decision-making when it comes to purchasing and selling stocks.
- Book Chapter
- 10.4018/978-1-6684-9809-5.ch003
- Jun 30, 2023
Diabetes is characterized by either insufficient or inefficient insulin production by the body. High blood glucose levels result from this, which over time can harm a number of tissues and organs in the body. Diabetes can be brought on by a specific age, obesity, inactivity, insufficient physical activity, inherited diabetes, lifestyle, poor diet, hypertension, etc. This chapter explores modeling requirements with diabetes using supervised machine learning techniques.
- Book Chapter
- 10.4018/978-1-6684-9809-5.ch002
- Jun 30, 2023
The network of billionaires could be analyzed through their data and information. Python has the capability of taking data and information from many sources and analyzing them from a single database. The idea of the report is to compare databases with all the billionaires in the world at different time frames. By collecting all the needed, such as personal information and integrable data, the program will output them in a table. The program takes datasets in the Comma Separated Values file and gives out the output through the Google Colab platform. The authors give importance to net worth and other publicly sensitive information that will be analyzed by the Python program. The purpose of analyzing billionaires around the world is to find similarities among countries with the potential for greater economic development.
- Book Chapter
- 10.4018/978-1-6684-9809-5.ch014
- Jun 30, 2023
The number of billionaires in a country gives us significant insights regarding the country's corporate and economic landscape. The number of billionaires indicates the country's economic performance, financial market strength, and amount of support for entrepreneurship and innovation. The presence of a large number of billionaires suggests that the country has a solid business climate that promotes the growth and success of affluent individuals. These people' riches may have been built through creative business methods, technical developments, or savvy investments. However, it is crucial to highlight that the number of billionaires does not always imply a thriving economy. Our project seeks to investigate the global links of billionaires and their commercial specialization tactics.
- Book Chapter
- 10.4018/978-1-6684-9809-5.ch012
- Jun 30, 2023
This chapter focuses on software development principles and discusses each principle thoroughly with diagrammatic representation. It also includes the definition of UML (unified modeling language) modelling with an explanation regarding how UML modelling takes place and a detailed example. It also focuses on software testing methods, with each method definition and diagrams well explained. A simple case study situation is taken to discuss the example of UML model. This chapter's main objective is to focus on all key points of software development testing and model design techniques precisely.
- Book Chapter
- 10.4018/978-1-6684-9809-5.ch008
- Jun 30, 2023
The music industry generates an enormous amount of data, which makes classifying and organizing that data into a genre a very difficult task. A potential solution to that problem is to cluster the music using machine learning. Machine learning algorithms might enhance personalized suggestions, search engines, and music categorization systems by creating a model which can precisely identify different genres relying on their acoustic and subjective properties. Recent research suggests that even though there is a large overlap across genres, with machine learning algorithms, we can properly categorize music genres by recognizing differences as well as similarities between them. In more general terms, grouping musical styles using machine learning has several uses in the music industry. It can speed up the identification of new musical styles and encourage cross-genre collaborations among musicians.
- Conference Article
3
- 10.1109/iccons.2017.8250523
- Jun 1, 2017
Now days reading words from an unconstrained and noisy image is not easy. Text localization and recognition in an image is a research area which takes efforts to develop a computer system with an ability to automatically read the text from images. The objective of this study is to propose a new method for text localization and recognition in natural scene images with complex background. In this paper, a hybrid methodology is suggested which extracts text from natural scene image with chaotic backgrounds. The proposed approach involves embedded system. This combines software with hardware. First, superimposed text regions in an image are extracted based on character descriptors features like Area, Bounding box, Perimeter, Euler number, Horizontal crossings. In the second step, superimposed text regions are tested for text content or non-text using character descriptors and SVM classifier. In the third step, detection of multiple lines in localized superimposed text regions is made and line segmentation is performed using horizontal profiles. In the final step, using vertical profiles each character of the segmented line is extracted. In the system ARM7 (LPC2138) is interface with Personal Computer. The GPS and GSM are also interface with the ARM7 (LPC2138). The extracted English text from an image is given to the ARM7 (LPC2138). This will be displayed on LCD. The GPS will obtain location coordinates of an image. The GSM will send SMS to the local tourist guide company like (e.g. Just Dial) to update information of natural scene image like (e.g. shop names, hotel names etc.). The workout has been done using images drawn from ICDAR 2013 and SVT 2010 datasets. The extracted text and location results will be played with IC (APR33A3). This system will be helpful for tourist and visually impaired. The results demonstrate the effectiveness of the proposed method, which can be used as an efficient method for text localization and recognition in natural scene images.
- Research Article
4
- 10.5815/ijigsp.2016.05.02
- May 8, 2016
- International Journal of Image, Graphics and Signal Processing
The objective of this study is to propose a new method for text region localization and character extraction in natural scene images with complex background. In this paper, a hybrid methodology is suggested which extracts multilingual text from natural scene image with cluttered backgrounds. The proposed approach involves four steps. First, potential text regions in an image are extracted based on edge features using Contourlet transform. In the second step, potential text regions are tested for text content or non-text using GLCM features and SVM classifier. In the third step, detection of multiple lines in localized text regions is done and line segmentation is performed using horizontal profiles. In the last step, each character of the segmented line is extracted using vertical profiles. The experimentation has been done using images drawn from own dataset and ICDAR dataset. The performance is measured in terms of the precision and recall. The results demonstrate the effectiveness of the proposed method, which can be used as an efficient method for text recognition in natural scene images.
- Conference Article
2
- 10.1145/3448891.3450336
- Dec 7, 2020
Universally, scene text in natural images is an expressive means of communication. Texts available in an image could provide important information which can enhance the interpretation of the image, for instance texts on product packages and road signs. Recently, scene texts detection has been recognized as an important research field in computer vision. However, texts detection in natural scene images has been challenging due to the complicated scene background involving varying fonts, sizes, image resolutions etc. As to date, different methods of detecting texts in natural scene images have been proposed for horizontal texts, arbitrarily-oriented texts and curved texts, but there is a lack of detection of vertically-oriented texts in the natural scene images. Hence, this research proposed a framework for detecting the top-to-bottom, bottom-to-top and horizontally-stacked vertical texts in natural scene images. In this paper, Multi-directional Text Detector (MTD) is modelled and developed to locate the vertically-oriented texts in natural scene images, and the Vertical Scene Texts Dataset-700 (VSTD-700) is developed. The preliminary testing shows that the success rate of detection of MTD is 87% on ICDAR 2013, 73% on ICDAR 2015, 71% on MSRA-TD500 and 87% on VSTD-700 datasets. Hence, the results showed that MTD is able to detect vertically-oriented texts on top of the horizontally-oriented and arbitrarily-oriented texts in natural scene images simultaneously.
- Conference Article
9
- 10.1109/iceca.2017.8203708
- Apr 1, 2017
Now days reading words from an unconstrained and noisy image is not easy. Text localization and recognition in an image is a research area which takes efforts to develop a computer system with an ability to automatically read the text from images. The Optical Character Recognition (OCR) tool gives good results obtained to read the text from an image. The objective of this study is to propose a new method for text localization and recognition in natural scene images with complex background. In this paper, a hybrid methodology is suggested which extracts text from natural scene image with chaotic backgrounds. The proposed approach involves four stages. First, superimposed text regions in an image are extracted based on character descriptors features like Area, Bounding box, Perimeter, Euler number, Horizontal crossings. In the second step, superimposed text regions are tested for text content or nontext using character descriptors and SVM classifier. In the third step, detection of multiple lines in localized text regions is done and line segmentation is performed using horizontal profiles. In the final step, using vertical profiles each character of the segmented line is extracted. The workout has been done using images drawn from ICDAR 2013 and SVT 2010 datasets. The results demonstrate the effectiveness of the proposed method, which can be used as an efficient method for text localization and recognition in natural scene images.
- Conference Article
11
- 10.1109/iciip.2017.8313739
- Dec 1, 2017
Textual matter present in a natural scene image provides indispensable information about it. The semantics and information present in the natural scene images can be perceived by extracting the text regions in them. Detection and localization of text from natural scene images is a challenging task for analysis of images due to various font size, font type, and illumination. In this paper, we propose a hybrid approach for text detection and localization based on text confidence score using three attributes namely stroke width dissimilarity, color dissimilarity and occupy rate convex area to discern text and non-text constituents. The aim of this paper is to achieve fast detection and localization of text regions in low resolution and blurred images. To accomplish this, the possible candidate regions are extracted using edge smoothing by fast guided filter followed by MSER. The text confidence score on these constituents is calculated using the Bayesian framework with the help of above mentioned three attributes. Experimental results on benchmark ICDAR 2013 testing dataset shows the efficacy of our method in the form of precision, recall, and f-measure.
- Research Article
4
- 10.1016/j.neucom.2020.02.084
- Feb 24, 2020
- Neurocomputing
Unified non-uniform scale adaptive sampling model for quality assessment of natural scene and screen content images
- Research Article
21
- 10.1016/j.procs.2012.09.126
- Jan 1, 2012
- Procedia Computer Science
A Hybrid Approach to Localize Farsi Text in Natural Scene Images
- Conference Article
- 10.1109/icsai.2016.7811070
- Nov 1, 2016
A new text location method based on color reduction and Adaboost classifier is proposed in this paper, which is used in extracting the text region in natural scene images with complex background. Firstly, the images are segmented into several layers through adopting color reduction and mean shift image segmentation techniques. Then, in order to pick up the potential text region, each layer is processed using connected component analysis, text region identification and text region merging, etc. Finally, HOG (Histogram of Oriented Gradient) and LBP (Local Binary Pattern) features are extracted from the candidate text region, and an AdaBoost classifier is applied to classify text and non-text regions. A series of experiments on a natural scene images database have indicated that, our method can effectively improve the text location in natural scene images with complex background, showing the effectiveness of the proposed approach.
- Conference Article
11
- 10.1109/iccic.2015.7435688
- Dec 1, 2015
Text in camera captured images contains important and useful information. Text in images can be used for identification, indexing and retrieval. Detection and localization of text from camera captured images is still a challenging task due to high variability of text appearance. In this paper we propose an efficient algorithm, for detecting and localizing text in natural scene images. The method is based on texture feature extraction using first and second order statistics. The entire work is divided into two stages. Text regions are detected in the first stage using texture features. Discriminative functions are used to filter out non-text regions. In the second stage the detected text regions are merged and localized. An experimental results obtained shows that the proposed approach detects and localizes texts of various sizes, fonts, orientations and languages efficiently.
- Conference Article
2
- 10.1109/iccpct.2013.6528865
- Mar 1, 2013
Detection and localization of texts from natural scene images is important and can provide a much truer form of content-based image analysis if it can be extracted and harnessed efficiently. This problem becomes challenging because of complex background, variations of text font, size and line orientation, non-uniform illumination. A new unsupervised text detection algorithm is proposed in this paper. In this approach scale space and morphological operations for the edge detection are utilized. The non-text components are efficiently filter out by using scale decomposition and 2D Gaussian low pass filter. They are extracted based on observation that the edge of a character can be extracted from the complex scenes by taking into consideration the high similarities in length and aspect ratio. The proposed method yielded high precision when experiments where evaluated in the ICDAR 2003 dataset.
- Conference Article
- 10.1145/3095713.3095724
- Jun 19, 2017
Detecting text in natural scene images is a challenging task. In this paper, we propose a character-level end-to-end text detection algorithm in natural scene images. In general, text detection tasks are categorized into three parts: text localization, text segmentation, and text recognition. The proposed method aims not only to localize but also to recognize text. To do these tasks successfully, the proposed method consists of four steps: character candidate patch extraction, patch classification using ensemble of ResNets, non-character region elimination, and character region grouping via self-tuning spectral clustering. In the character candidate patch extraction step, character candidate patches are extracted from the image by using both edge information from multi-scale images and Maximally Stable Extremal Regions (MSERs). Then each patch is classified into either character patch or non-character patch by using the deep network that is composed of three ResNets with different hyper-parameters. Text regions are determined by filtering out non-character patches. In order to make further reduction of classification errors, character characteristics are employed to compensate classification results of the ensemble of ResNets. To evaluate the text detection performance, character regions are grouped via self-tuning spectral clustering. The proposed method shows competitive performance on the ICDAR 2013 dataset.
- Conference Article
10
- 10.1109/iccsce.2018.8685019
- Nov 1, 2018
Text recognition plays an important role in recognizing texts presented in the images as they provide important information. Scene text recognition has been an active research topic with rapid growth of development to improve the performance of text recognition with better reliability and accuracy. However, scene text recognition is challenging due to images containing inconsistent lighting, low resolution and blurriness. In addition, scene texts are usually taken from outdoor signboards, signage and road signs, which contain various orientation and fancy font styles to attract attention. Various researchers have proposed methods for recognizing different orientations of scene texts, such as horizontal texts, curved texts and rotated texts. However, to data there is a lack of research in recognizing vertical texts in natural scene images. In this research, a model for effective automatic recognition of vertical texts in natural scene images has been proposed, consisting of two major processes which are text localization and segmentation and text recognition. This proposed model recognizes three different types of vertical scene texts, which are top-to-bottom vertical texts, bottom-to-top vertical texts and horizontal-stacked vertical texts.
- Conference Article
3
- 10.1109/icpr.2016.7899796
- Dec 1, 2016
With the rapid increase of multimedia data, textual content in an image has become a very important source of information for several applications like navigation, image search and retrieval, image understanding, captioning, machine translation and several others. Scene text localization is the first step towards such applications and most current methods focus on generating a small set of high precision detectors rather than obtaining large set of detections covering all text patches. In this work we propose a novel hybrid framework for text localization which uses character level recognition recursively in a feedback mechanism to refine text patches and reduce false positives. We use popular MSER algorithm at multiple scales as an initial region proposal algorithm and several filtering stages recursively to improve precision as well as maximize recall. We aim at achieving high recall rather than achieving higher precision since several robust word recognition systems are already available. The word recognition systems are mature enough to produce highly accurate results if provided with maximum amount of regions rather than providing small set of highly precise text patches and losing several other text regions. The main contribution of this paper is the use of character recognizer within a novel feedback mechanism to recursively search for text regions in the neighborhood of previously detected text patches. Using 3 publicly available benchmark datasets (ICDAR2011, MSRA TD-500 and OSTD), we demonstrate the efficacy of the proposed framework for text localization.
- Conference Article
3
- 10.1109/iccs.2018.00029
- Aug 1, 2018
With the advancement of digital technology, use of various multimedia devices such as portable camera and mobile camera has grown at rapid pace. Now days, these portable devices are used to capture various scene images which usually contain text. The text part in the scene images sometimes contain useful information which needs to be recognized for various applications. But, text detection in natural scene images is very challenging process due to factors such as uneven illumination, arbitrary layouts, perspective distortion and warped text. In order to overcome these challenges, study is being done by various researchers to detect text from the natural scene images. In this paper, author explains the various text detection techniques which have already been previously used for text detection in natural images. Apart from that, author has also described the new era of technology i.e. Deep Learning based techniques for text detection in natural scene images.
- Research Article
77
- 10.1016/j.sigpro.2017.10.025
- Oct 31, 2017
- Signal Processing
Saliency-induced reduced-reference quality index for natural scene and screen content images
- New
- Research Article
- 10.1007/s42979-025-04532-x
- Dec 2, 2025
- SN Computer Science
- New
- Research Article
- 10.1007/s42979-025-04536-7
- Dec 2, 2025
- SN Computer Science
- New
- Research Article
- 10.1007/s42979-025-04558-1
- Dec 2, 2025
- SN Computer Science
- New
- Research Article
- 10.1007/s42979-025-04552-7
- Dec 2, 2025
- SN Computer Science
- New
- Research Article
- 10.1007/s42979-025-04538-5
- Dec 2, 2025
- SN Computer Science
- New
- Research Article
- 10.1007/s42979-025-04537-6
- Dec 2, 2025
- SN Computer Science
- New
- Research Article
- 10.1007/s42979-025-04284-8
- Dec 2, 2025
- SN Computer Science
- New
- Research Article
- 10.1007/s42979-025-04556-3
- Nov 29, 2025
- SN Computer Science
- New
- Research Article
- 10.1007/s42979-025-04553-6
- Nov 28, 2025
- SN Computer Science
- New
- Research Article
- 10.1007/s42979-025-04559-0
- Nov 28, 2025
- SN Computer Science
- Ask R Discovery
- Chat PDF