Automatic semantic knowledge extraction from electronic forms

Haolin Wu,Tim French,Melinda Hodkiewicz,Wei Liu

doi:10.1177/1748006x221098272

Abstract

Electronic tabular forms are an intuitive way for organisations to collect, present and store structured information for human readers. Forms use features such as fonts, colours and cell positioning to help readers navigate and find information. Millions of forms, typically in Portable Document Format (PDF), are generated by businesses as part of routine operations. Unlike human readers, machines are not able to directly ‘understand’ the implicit cues contained in the fonts, colours and use of boxes without explicit processing. In this paper, a supervised computer vision model is proposed to decompose the PDF form document into nested microtables. The cells within these microtables are then processed using a customisable rule bank for meaningful table content and semantic relationship extraction. The process is demonstrated on an industry dataset of 37 maintenance procedure documents containing 373 pages and 1016 unique microtables. A web application EMU (Extracting Machine Understandable Semantics from Forms) demonstrates how data captured in tables with different dimensions in procedural forms can be automatically extracted and stored in JavaScript Object Notation (JSON). Identifying and extracting nested tables is a critical fundamental step for future applications to support machine-automated search and extraction of data at scale for both maintenance and other procedural documentation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic semantic knowledge extraction from electronic forms

Abstract

Talk to us

Similar Papers

More From: Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability

Lead the way for us

Journal: Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability	Publication Date: May 23, 2022
Citations: 2

Similar Papers

Automated Extraction of Bioclimatic Time Series from PDF Tables
Sabino Maggi ... Saverio Vicario
-
Sabino Maggi, et. al.Sabino Maggi ... Saverio Vicario
15 May 2023
15 May 2023

Automatic extraction and visualization of semantic relations between medical entities from medicine instructions
Maofu Liu ... Li Jiang
Multimedia Tools and Applications | VOL. 76
Maofu Liu, et. al.Maofu Liu ... Li Jiang
01 Dec 2015
Multimedia Tools and Applications | VOL. 76

Automatic knowledge extraction of any Chatbot from conversation
Sasa Arsovski ... Adrian David Cheok
Expert Systems With Applications | VOL. 137
Sasa Arsovski, et. al.Sasa Arsovski ... Adrian David Cheok
08 Jul 2019
Expert Systems With Applications | VOL. 137

An Approach Based on Patterns for Synonymy Relations Detection
M Tovar ... G Flores-Petlacalco
Journal of Physics: Conference Series | VOL. 1828
M Tovar, et. al.M Tovar ... G Flores-Petlacalco
01 Feb 2021
Journal of Physics: Conference Series | VOL. 1828

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic semantic knowledge extraction from electronic forms

Abstract

Talk to us

Similar Papers

More From: Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability