SAO Semantic Information Identification for Text Mining

Chao Yang,Donghua Zhu,Xuefeng Wang

doi:10.2991/ijcis.2017.10.1.40

Chao Yang, Donghua Zhu + Show 1 more

Open Access

https://doi.org/10.2991/ijcis.2017.10.1.40

Copy DOI

Abstract

A Subject-Action-Object (SAO) is a triple structure which can be used to both describe topics in detail and explore the relationship between them. SAO analysis has become popular in bibliometrics, however there are two challenges in the identification of SAO structures: low relevance of SAOs to domain topics; and synonyms in SAO. These problems make the identification of SAO greatly dependent upon domain experts, limiting the further usage of SAO and influencing further the mining of SAO characteristics. This paper proposes a parse tree-based SAO identification method that includes (1) a model to identify the core components (candidate terms for subject & object) of SAO structures, where term clumping processes and co-word analysis are involved; (2) a parse tree-based hierarchical SAO extraction model to divide entire SAO structures into a collection of simpler sub-tasks for separate subject, action, and object identification; and (3) an SAO weighting model to rank SAO structures for result selection. The proposed method is applied to publications in the Journal of Scientometrics (SCIM), to identify and rank significant SAO structures. Our experiment results demonstrate the validity and feasibility of the proposed method.

Highlights

An SAO is a triple structure extracted from a text corpus
Scientometrics is a leading journal in the field of Information Science & Library Science, and provides good balances between theoretical research & empirical studies, and information science & management needs. Such publications, which contain a rich variety of topics with strong features of coupling, would make great sense to be used for our SAO identification and explore insights to compare with the traditional bibliometric techniques
Existing SAO research have predominantly focused on applications rather than the SAO extraction techniques

Summary

Introduction

An SAO is a triple structure extracted from a text corpus. Subjects and objects are terms or phrases that are closely related to the topic. SAO is helpful for (1) solving the problem of ambiguous interpretations resulted by homonyms and synonyms of words 17, 18; and (2) identifying the specific relationship between topic terms.[20]. Compared with traditional SAO identification methods, the main contributions of the proposed method are: (1) introduce term clumping and design a co-word algorithm (considering the co-occurrence with keywords) to identify SAO core components, which is helpful for improving the relevance of SAOs to topic.

Symbolic SAO extraction approach

Statistical SAO extraction approach

Methodology

Components identification model

SAO extraction model

23: Else return non-complete SAO

SAO weighting model

A case study

Components identification

SAO extraction and SAO weighting

Aim map intellectual structure

Findings

Discussion and conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Computational Intelligence Systems	Publication Date: Jan 1, 2017
Citations: 14	License type: cc-by

R Discovery Prime

R Discovery Prime

SAO Semantic Information Identification for Text Mining

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Computational Intelligence Systems

Lead the way for us

Similar Papers

SAO2Vec: Development of an algorithm for embedding the subject-action-object (SAO) structure using Doc2Vec.
Sunhye Kim ... Inchae Park
PLOS ONE | VOL. 15
Sunhye Kim, et. al.Sunhye Kim ... Inchae Park
05 Feb 2020
PLOS ONE | VOL. 15

Exploring Technology Opportunities Based on User Needs: Application of Opinion Mining and SAO Analysis
Hyejin Jang ... Byungun Yoon
Engineering Management Journal | VOL. 35
Hyejin Jang, et. al.Hyejin Jang ... Byungun Yoon
05 May 2022
Engineering Management Journal | VOL. 35

Extraction of Knowledge and Processing of the Patent Array
Marina Fomenkova ... Alla G Kravets
-
Marina Fomenkova, et. al.Marina Fomenkova ... Alla G Kravets
01 Jan 2019
01 Jan 2019

Subject–action–object-based morphology analysis for determining the direction of technological change
Junfang Guo ... Donghua Zhu
Technological Forecasting and Social Change | VOL. 105
Junfang Guo, et. al.Junfang Guo ... Donghua Zhu
11 Feb 2016
Technological Forecasting and Social Change | VOL. 105

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SAO Semantic Information Identification for Text Mining

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Computational Intelligence Systems