Abstract

In the information age, how to quickly obtain information and extract key information from massive and complex re-sources has become challenging. Extracting information from scanned or captured document is one of the most demanding process in many areas such as finance, accounting, and taxation. The current achievement in the computer vision field has shown a substantial improvement in the field of Optical Character Recognition (OCR), including text detection and recognition tasks. However, there are two challenges for current OCR. The first one is the quality of the input data which is captured by mobile phone. The quality is greatly affected by external factors like light condition, dynamic environment or blurry content. Secondly, Key Information Extraction (KIE) from documents, which is a downstream task of OCR, had been a largely under explored domain because the input documents have not only textual features extracting from OCR systems but also semantic visual features which are not fully utilized and play a critical role in KIE. In this paper, we propose an end-to-end system based on several state-of-the-art models from both computer vision and natural language processing areas to deal with the Mobile captured receipts OCR (MC-OCR) challenge, including two tasks: (1) evaluating the quality of the captured receipt, and (2) recognizing required fields of the receipt. Our code is publicly available at https://github.com/ndcuong9/MC_OCR

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.