SAO2Vec: Development of an algorithm for embedding the subject-action-object (SAO) structure using Doc2Vec.

Sunhye Kim,Byungun Yoon,Inchae Park

doi:10.1371/journal.pone.0227930

Sunhye Kim, Byungun Yoon + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0227930

Copy DOI

Journal: PLOS ONE	Publication Date: Feb 5, 2020
Citations: 24	License type: CC BY 4.0

Affiliation: Dongguk University, Hansung University

Abstract

In natural-language processing, the subject–action–object (SAO) structure is used to convert unstructured textual data into structured textual data comprising subjects, actions, and objects. This structure is suitable for analyzing the key elements of technology, as well as the relationships between these elements. However, analysis using the existing SAO structure requires a substantial number of manual processes because this structure does not represent the context of the sentences. Thus, we introduce the concept of SAO2Vec, in which SAO is used to embed the vectors of sentences and documents, for use in text mining in the analysis of technical documents. First, the technical documents of interest are collected, and SAO structures are extracted from them. Then, sentence vectors are extracted through the Doc2Vec algorithm and are updated using word vectors in the SAO structure. Finally, SAO vectors are drawn using an updated sentence vector with the same SAO structure. In addition, document vectors are derived from the document’s SAO vectors. The results of an experiment in the Internet of things field indicate that the SAO2Vec method produces 3.1% better accuracy than the Doc2Vec method and 115.0% better accuracy than SAO frequency alone. This proves that the proposed SAO2Vec algorithm can be used to improve grouping and similarity analysis by including both the meanings and the contexts of technical elements.

Highlights

Given the sophistication of the information society and a large amount of technical literature being created, it is quite important to analyze the implications of that literature
Document vectors based on SAO frequency use only the frequency of SAO structures, and these vectors are proportional to the number of SAO structures, with features referring to SAOs
We developed SAO2Vec, an algorithm for embedding SAO structures based on the Doc2Vec learning method

Summary

Introduction

Given the sophistication of the information society and a large amount of technical literature being created, it is quite important to analyze the implications of that literature. Technical documentation is written to record scientific or technical knowledge; this includes patent literature, technical reports, and product descriptions. These technical documents contain ample information regarding science and technology, as well as practical examples and trends; this information can be processed and used for various purposes. Many text-mining researchers have suggested approaches for extracting important content from documents.

Objectives

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SAO2Vec: Development of an algorithm for embedding the subject-action-object (SAO) structure using Doc2Vec.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

SAO Semantic Information Identification for Text Mining
Chao Yang ... Xuefeng Wang
International Journal of Computational Intelligence Systems | VOL. 10
Chao Yang, et. al.Chao Yang ... Xuefeng Wang
01 Jan 2017
International Journal of Computational Intelligence Systems | VOL. 10

Extraction of Knowledge and Processing of the Patent Array
Marina Fomenkova ... Alla G Kravets
-
Marina Fomenkova, et. al.Marina Fomenkova ... Alla G Kravets
01 Jan 2019
01 Jan 2019

Exploring Technology Opportunities Based on User Needs: Application of Opinion Mining and SAO Analysis
Hyejin Jang ... Byungun Yoon
Engineering Management Journal | VOL. 35
Hyejin Jang, et. al.Hyejin Jang ... Byungun Yoon
05 May 2022
Engineering Management Journal | VOL. 35

Generic SAO Similarity Measure via Extended Sørensen-Dice Index
Xiaoman Li ... Xuefu Zhang
IEEE Access | VOL. 8
Xiaoman Li, et. al.Xiaoman Li ... Xuefu Zhang
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SAO2Vec: Development of an algorithm for embedding the subject-action-object (SAO) structure using Doc2Vec.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE