Abstract
A Subject-Action-Object (SAO) is a triple structure which can be used to both describe topics in detail and explore the relationship between them. SAO analysis has become popular in bibliometrics, however there are two challenges in the identification of SAO structures: low relevance of SAOs to domain topics; and synonyms in SAO. These problems make the identification of SAO greatly dependent upon domain experts, limiting the further usage of SAO and influencing further the mining of SAO characteristics. This paper proposes a parse tree-based SAO identification method that includes (1) a model to identify the core components (candidate terms for subject & object) of SAO structures, where term clumping processes and co-word analysis are involved; (2) a parse tree-based hierarchical SAO extraction model to divide entire SAO structures into a collection of simpler sub-tasks for separate subject, action, and object identification; and (3) an SAO weighting model to rank SAO structures for result selection. The proposed method is applied to publications in the Journal of Scientometrics (SCIM), to identify and rank significant SAO structures. Our experiment results demonstrate the validity and feasibility of the proposed method.
Highlights
An SAO is a triple structure extracted from a text corpus
Scientometrics is a leading journal in the field of Information Science & Library Science, and provides good balances between theoretical research & empirical studies, and information science & management needs. Such publications, which contain a rich variety of topics with strong features of coupling, would make great sense to be used for our SAO identification and explore insights to compare with the traditional bibliometric techniques
Existing SAO research have predominantly focused on applications rather than the SAO extraction techniques
Summary
An SAO is a triple structure extracted from a text corpus. Subjects and objects are terms or phrases that are closely related to the topic. SAO is helpful for (1) solving the problem of ambiguous interpretations resulted by homonyms and synonyms of words 17, 18; and (2) identifying the specific relationship between topic terms.[20]. Compared with traditional SAO identification methods, the main contributions of the proposed method are: (1) introduce term clumping and design a co-word algorithm (considering the co-occurrence with keywords) to identify SAO core components, which is helpful for improving the relevance of SAOs to topic.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Computational Intelligence Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.