Abstract

Topic mining of scientific literature can accurately capture the contextual structure of a topic, track research hotspots within a field, and improve the availability of information about the literature. This paper introduces a multi-dimensional topic mining method based on a hierarchical semantic graph model. The main innovations include (1) the hierarchical extraction of feature terms and construction of a corresponding semantic graph and (2) multi-dimensional topic mining based on graph segmentation and structure analysis. The process of semantic graph construction is based primarily on hierarchical feature term extraction, which can effectively reveal the hierarchical structural distribution of feature terms within documents. Our graph model also takes into account the complementarity of content- and context-related feature terms in documents while avoiding the loss of textual information. In addition, the multi-dimensional features of the topic can be mined effectively via an in-depth analysis of the constructed graph, resulting in a quantitative visualization of the many-to-many association between the topic and feature terms. A variety of experiments on existing document datasets demonstrate that the proposed approach is able to outperform state-of-the-art methods in terms of accuracy and efficacy.

Highlights

  • With the rapid development of database and Web 2.0 technologies, the volume of literature available online is experiencing explosive growth

  • The main innovations include (1) hierarchical extraction of feature terms and construction of corresponding semantic graphs and (2) topic mining based on graph segmentation and structure analysis

  • Based on the concept of Latent Semantic Indexing (LSI), Hofmann [32] proposed a refinement known as probabilistic latent semantic analysis (PLSA)

Read more

Summary

INTRODUCTION

With the rapid development of database and Web 2.0 technologies, the volume of literature available online is experiencing explosive growth. The nodes and edges in the graph can clearly reveal the complex relationships between feature terms and effectively highlight the documents’ core information [16], [17] In this method, as in other graph-based topic mining methods, the construction of a document graph is the foundation. The main innovations include (1) hierarchical extraction of feature terms and construction of corresponding semantic graphs and (2) topic mining based on graph segmentation and structure analysis. The graph constructed can comprehensively represent the mutual relevance of feature terms based on both content and context while avoiding the loss of textual information; the hierarchical extraction of the terms used in the graph can effectively reveal the hierarchical structure to which they are related.

RELATED WORK
SEMANTIC GRAPH CONSTRUCTION
TERM CORRELATION BASED ON CO-OCCURRENCE
TOPIC MINING FROM THE REFINED SEMANTIC GRAPH
SUBGRAPH SEGMENTATION
TOPIC CLUSTERING
EXPERIMENT
TOPIC CLUSTERING RESULTS ANALYSIS
METRICS EVALUATION
DISCUSSION
Findings
VIII. CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.