Layout Aware Semantic Element Extraction for Sustainable Science &amp; Technology Decision Support

Hyuntae Kim,Soyoung Park,Yuchul Jung,Jongyun Choi

doi:10.3390/su14052802

Abstract

New scientific and technological (S&T) knowledge is being introduced rapidly, and hence, analysis efforts to understand and analyze new published S&T documents are increasing daily. Automated text mining and vision recognition techniques alleviate the burden somewhat, but the various document layout formats and knowledge content granularities across the S&T field make it challenging. Therefore, this paper proposes LA-SEE (LAME and Vi-SEE), a knowledge graph construction framework that simultaneously extracts meta-information and useful image objects from S&T documents in various layout formats. We adopt Layout-aware Metadata Extraction (LAME), which can accurately extract metadata from various layout formats, and implement a transformer-based instance segmentation (i.e., Vision based Semantic Elements Extraction (Vi-SEE)) to maximize the vision-based semantic element recognition. Moreover, to constructing a scientific knowledge graph consisting of multiple S&T documents, we newly defined an extensible Semantic Elements Knowledge Graph (SEKG) structure. For now, we succeeded in extracting about 6 million semantic elements from 49,649 PDFs. In addition, to illustrate the potential power of our SEKG, we provide two promising application scenarios, such as a scientific knowledge guide across multiple S&T documents and questions and answering over scientific tables.

Highlights

Decision support systems or specific methods for science and technology (S&T) problems or social issues can be employed effectively across various domain user types related to policymaking, research topic search, research method survey, comparing experimental results, emerging technology trend analyses, etc.Junior researchers may have difficulty collecting target information due to lacking domain knowledge
We propose a layoutaware semantic element extraction (LA-SEE) framework that can extract meta and semantic knowledge from S&T documents and construct a Knowledge graphs (KGs) with the extracted semantic elements
We propose two user scenarios based on the proposed Semantic Elements Knowledge Graph (SEKG) to confirm promising applications

Summary

Introduction

Decision support systems or specific methods for science and technology (S&T) problems or social issues can be employed effectively across various domain user types related to policymaking, research topic search, research method survey, comparing experimental results, emerging technology trend analyses, etc. In order to resolve these limitations, this study aims to enable a sophisticated decision support system by extracting semantic elements from S&T documents and constructing a knowledge graph with the semantic elements. Recently proposed SciNLP-KG, an end-to-end natural language processing (NLP) KG construction with 30,000 NLP papers focusing on four extracted relationship types among tasks, datasets, and evaluation metrics Their relationship extraction modules still only achieved an F1-score < 80%. Liu et al [9] defined a metaknowledge architecture to construct structural knowledge with documents, in contrast with previous KGs but similar to the present paper’s approach They employed a multi-modal metaknowledge extraction model to extract and organize metaknowledge elements (e.g., titles, authors, abstracts, and sections) from a government policy document dataset and DocBank [10]. We propose two user scenarios based on the proposed SEKG to confirm promising applications

Metadata Extraction from Articles

Vision-Based Document Analysis

Scientific Knowledge Extraction

Document Modeling

LA-SEE Framework

Vi-SEE

ISTR Selection

Post-Processing Identified Semantic Elements

Organizing Knowledge with SEKG for Multiple Documents

Data foranalysis

Dataset for Vi-SEE

Proposed LA-SEE Performance

Constructed Semantic Element Statistics

Decision Support Applications in Science and Technology Domain

Scientific Knowledge Guide

Conclusions and Future Work

Findings

Conclusions andelements

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sustainability	Publication Date: Feb 28, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Layout Aware Semantic Element Extraction for Sustainable Science & Technology Decision Support

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sustainability

Lead the way for us

Similar Papers

Using Named Graphs and Knowledge Graph Template Patterns for Efficiently Organizing FAIR Anatomy Data and Metadata
Lars Vogt ... Roman Baum
Biodiversity Information Science and Standards | VOL. 3
Lars Vogt, et. al.Lars Vogt ... Roman Baum
19 Jun 2019
Biodiversity Information Science and Standards | VOL. 3

An Intelligent System for Semantic Information Extraction and Knowledge Graph Construction from Multi-Type Data Sources
Hanrong Zhang ... Bo Qin
-
Hanrong Zhang, et. al.Hanrong Zhang ... Bo Qin
01 Oct 2022
01 Oct 2022

From Data to Knowledge: A semantic knowledge graph application for curating specimen data
Peter Grobe ... Christian Köhler
Biodiversity Information Science and Standards | VOL. 3
Peter Grobe, et. al.Peter Grobe ... Christian Köhler
26 Jun 2019
Biodiversity Information Science and Standards | VOL. 3

Integrating AI with medical industry chain data: enhancing clinical nutrition research through semantic knowledge graphs.
Deng Chen ... Chengjie Lu
Frontiers in digital health | VOL. 6
Deng Chen, et. al.Deng Chen ... Chengjie Lu
03 Oct 2024
Frontiers in digital health | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Layout Aware Semantic Element Extraction for Sustainable Science &amp; Technology Decision Support

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sustainability

Layout Aware Semantic Element Extraction for Sustainable Science & Technology Decision Support