Automated invoice data extraction using image processing

Akanksh Aparna Manjunath,Pratiba Deenadhayalan,Shreyas Sunkad,Satish Nitin Pandit,Santhanam Nishith,Manjunath Sudhakar Nayak,Shobha Gangadhara

doi:10.11591/ijai.v12.i2.pp514-521

Akanksh Aparna Manjunath, Pratiba Deenadhayalan + Show 5 more

Open Access

https://doi.org/10.11591/ijai.v12.i2.pp514-521

Copy DOI

Abstract

Manually processing invoices which are in the form of scanned photocopies is a time-consuming process. There is a need to automate the task of extraction of data from the invoices with a similar format. In this paper we investigate and analyse various techniques of image processing and text extraction to improve the results of the optical character recognition (OCR) engine, which is applied to extract the text from the invoice. This paper also proposes the design and implementation of a web enabled invoice processing system (IPS). The IPS consists of an annotation tool and an extraction tool. The annotation tool is used to mark the fields of interest in the invoice which are to be extracted. The extraction tool makes use of opensource computer vision library (OpenCV) algorithms to detect text. The proposed system was tested on more than 25 types of invoices with the average accuracy score lying between 85% and 95%. Finally, to provide ease of use, a web application is developed which also presents the results in a structured format. The entire system is designed so as to provide flexibility and automate the process of extracting details of interest from the invoices.

Full Text