Abstract
In natural-language processing, the subject–action–object (SAO) structure is used to convert unstructured textual data into structured textual data comprising subjects, actions, and objects. This structure is suitable for analyzing the key elements of technology, as well as the relationships between these elements. However, analysis using the existing SAO structure requires a substantial number of manual processes because this structure does not represent the context of the sentences. Thus, we introduce the concept of SAO2Vec, in which SAO is used to embed the vectors of sentences and documents, for use in text mining in the analysis of technical documents. First, the technical documents of interest are collected, and SAO structures are extracted from them. Then, sentence vectors are extracted through the Doc2Vec algorithm and are updated using word vectors in the SAO structure. Finally, SAO vectors are drawn using an updated sentence vector with the same SAO structure. In addition, document vectors are derived from the document’s SAO vectors. The results of an experiment in the Internet of things field indicate that the SAO2Vec method produces 3.1% better accuracy than the Doc2Vec method and 115.0% better accuracy than SAO frequency alone. This proves that the proposed SAO2Vec algorithm can be used to improve grouping and similarity analysis by including both the meanings and the contexts of technical elements.
Highlights
Given the sophistication of the information society and a large amount of technical literature being created, it is quite important to analyze the implications of that literature
Document vectors based on SAO frequency use only the frequency of SAO structures, and these vectors are proportional to the number of SAO structures, with features referring to SAOs
We developed SAO2Vec, an algorithm for embedding SAO structures based on the Doc2Vec learning method
Summary
Given the sophistication of the information society and a large amount of technical literature being created, it is quite important to analyze the implications of that literature. Technical documentation is written to record scientific or technical knowledge; this includes patent literature, technical reports, and product descriptions. These technical documents contain ample information regarding science and technology, as well as practical examples and trends; this information can be processed and used for various purposes. Many text-mining researchers have suggested approaches for extracting important content from documents.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.