Abstract

New scientific and technological (S&T) knowledge is being introduced rapidly, and hence, analysis efforts to understand and analyze new published S&T documents are increasing daily. Automated text mining and vision recognition techniques alleviate the burden somewhat, but the various document layout formats and knowledge content granularities across the S&T field make it challenging. Therefore, this paper proposes LA-SEE (LAME and Vi-SEE), a knowledge graph construction framework that simultaneously extracts meta-information and useful image objects from S&T documents in various layout formats. We adopt Layout-aware Metadata Extraction (LAME), which can accurately extract metadata from various layout formats, and implement a transformer-based instance segmentation (i.e., Vision based Semantic Elements Extraction (Vi-SEE)) to maximize the vision-based semantic element recognition. Moreover, to constructing a scientific knowledge graph consisting of multiple S&T documents, we newly defined an extensible Semantic Elements Knowledge Graph (SEKG) structure. For now, we succeeded in extracting about 6 million semantic elements from 49,649 PDFs. In addition, to illustrate the potential power of our SEKG, we provide two promising application scenarios, such as a scientific knowledge guide across multiple S&T documents and questions and answering over scientific tables.

Highlights

  • Decision support systems or specific methods for science and technology (S&T) problems or social issues can be employed effectively across various domain user types related to policymaking, research topic search, research method survey, comparing experimental results, emerging technology trend analyses, etc.Junior researchers may have difficulty collecting target information due to lacking domain knowledge

  • We propose a layoutaware semantic element extraction (LA-SEE) framework that can extract meta and semantic knowledge from S&T documents and construct a Knowledge graphs (KGs) with the extracted semantic elements

  • We propose two user scenarios based on the proposed Semantic Elements Knowledge Graph (SEKG) to confirm promising applications

Read more

Summary

Introduction

Decision support systems or specific methods for science and technology (S&T) problems or social issues can be employed effectively across various domain user types related to policymaking, research topic search, research method survey, comparing experimental results, emerging technology trend analyses, etc. In order to resolve these limitations, this study aims to enable a sophisticated decision support system by extracting semantic elements from S&T documents and constructing a knowledge graph with the semantic elements. Recently proposed SciNLP-KG, an end-to-end natural language processing (NLP) KG construction with 30,000 NLP papers focusing on four extracted relationship types among tasks, datasets, and evaluation metrics Their relationship extraction modules still only achieved an F1-score < 80%. Liu et al [9] defined a metaknowledge architecture to construct structural knowledge with documents, in contrast with previous KGs but similar to the present paper’s approach They employed a multi-modal metaknowledge extraction model to extract and organize metaknowledge elements (e.g., titles, authors, abstracts, and sections) from a government policy document dataset and DocBank [10]. We propose two user scenarios based on the proposed SEKG to confirm promising applications

Metadata Extraction from Articles
Vision-Based Document Analysis
Scientific Knowledge Extraction
Document Modeling
LA-SEE Framework
Vi-SEE
ISTR Selection
Post-Processing Identified Semantic Elements
Organizing Knowledge with SEKG for Multiple Documents
Data foranalysis
Dataset for Vi-SEE
Proposed LA-SEE Performance
Constructed Semantic Element Statistics
Decision Support Applications in Science and Technology Domain
Scientific Knowledge Guide
Conclusions and Future Work
Findings
Conclusions andelements
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call