Abstract

Manually processing invoices which are in the form of scanned photocopies is a time-consuming process. There is a need to automate the task of extraction of data from the invoices with a similar format. In this paper we investigate and analyse various techniques of image processing and text extraction to improve the results of the optical character recognition (OCR) engine, which is applied to extract the text from the invoice. This paper also proposes the design and implementation of a web enabled invoice processing system (IPS). The IPS consists of an annotation tool and an extraction tool. The annotation tool is used to mark the fields of interest in the invoice which are to be extracted. The extraction tool makes use of opensource computer vision library (OpenCV) algorithms to detect text. The proposed system was tested on more than 25 types of invoices with the average accuracy score lying between 85% and 95%. Finally, to provide ease of use, a web application is developed which also presents the results in a structured format. The entire system is designed so as to provide flexibility and automate the process of extracting details of interest from the invoices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call