Abstract
Topic mining of scientific literature can accurately capture the contextual structure of a topic, track research hotspots within a field, and improve the availability of information about the literature. This paper introduces a multi-dimensional topic mining method based on a hierarchical semantic graph model. The main innovations include (1) the hierarchical extraction of feature terms and construction of a corresponding semantic graph and (2) multi-dimensional topic mining based on graph segmentation and structure analysis. The process of semantic graph construction is based primarily on hierarchical feature term extraction, which can effectively reveal the hierarchical structural distribution of feature terms within documents. Our graph model also takes into account the complementarity of content- and context-related feature terms in documents while avoiding the loss of textual information. In addition, the multi-dimensional features of the topic can be mined effectively via an in-depth analysis of the constructed graph, resulting in a quantitative visualization of the many-to-many association between the topic and feature terms. A variety of experiments on existing document datasets demonstrate that the proposed approach is able to outperform state-of-the-art methods in terms of accuracy and efficacy.
Highlights
With the rapid development of database and Web 2.0 technologies, the volume of literature available online is experiencing explosive growth
The main innovations include (1) hierarchical extraction of feature terms and construction of corresponding semantic graphs and (2) topic mining based on graph segmentation and structure analysis
Based on the concept of Latent Semantic Indexing (LSI), Hofmann [32] proposed a refinement known as probabilistic latent semantic analysis (PLSA)
Summary
With the rapid development of database and Web 2.0 technologies, the volume of literature available online is experiencing explosive growth. The nodes and edges in the graph can clearly reveal the complex relationships between feature terms and effectively highlight the documents’ core information [16], [17] In this method, as in other graph-based topic mining methods, the construction of a document graph is the foundation. The main innovations include (1) hierarchical extraction of feature terms and construction of corresponding semantic graphs and (2) topic mining based on graph segmentation and structure analysis. The graph constructed can comprehensively represent the mutual relevance of feature terms based on both content and context while avoiding the loss of textual information; the hierarchical extraction of the terms used in the graph can effectively reveal the hierarchical structure to which they are related.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have