Abstract

Use of electronic gadgets is increasing in this digital era as people often tend to use digital devices to perform there day to day activities. More and more fields such as government offices, banking sectors, educational institutions and so on use digital information for ease of storage and access. Today, the world is moving towards the paperless offices than using the physical paper documents. Paper documents once scanned get converted into document images. These document images have complex layouts which make it difficult to process. Processing these document images is a document image analysis problem known as DIA problem. It's been observed that the document image dataset is exponentially increasing which has simple to very complex structure leading to complex problems in processing. In order to find suitable solutions to these issues we intend to make use of a deep learning based approach for layout equivalence detection which can help to solve the retrieval problems and which may also be useful in the area of digital forensics based on the document layouts. Proposed approach follows two stages in which the first stage performs object detection using deep learning model followed by creation of bounding boxes for different entities which is further used to compare the layouts of the document images to predict the matching layouts for equivalence. Proposed model is a combination of object detection, bounding box creation and extraction of layouts which further uses Brute force matcher to perform equivalence detection. Here, we also present the state of the art document image datasets that is widely used by the researchers in the literature for solving DIA problems. Some of the publicly available datasets are considered for analyzing the results. The results show the robustness of the proposed approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call