Abstract
The diversities of large-scale semistructured data make the extraction of implicit semantic information have enormous difficulties. This paper proposes an automatic and unsupervised method of text categorization, in which tree-shape structures are used to represent semantic knowledge and to explore implicit information by mining hidden structures without cumbersome lexical analysis. Mining implicit frequent structures in trees can discover both direct and indirect semantic relations, which largely enhances the accuracy of matching and classifying texts. The experimental results show that the proposed algorithm remarkably reduces the time and effort spent in training and classifying, which outperforms established competitors in correctness and effectiveness.
Highlights
Rapid developmental trend in social network means the explosive growth of users as well as dramatic changes in providing services
Zaki and Aggarwal [4] propose a structural rule-based classifier for semistructured data, called XMiner, which can mine out parent-child frequent branches and ancestor-descendant ones and conduct structured or semistructured data perfectly, but the shortness is the lack of semantic information in text representation
Semantic similarity assessment [7, 8] can be exploited to improve the accuracy of current information retrieval techniques [9], to automatically annotate documents [10, 11], to protect privacy [12, 13], to match web services [14], and to resolve problems based on knowledge reuse [15]
Summary
Rapid developmental trend in social network means the explosive growth of users as well as dramatic changes in providing services. Semantic similarity assessment [7, 8] can be exploited to improve the accuracy of current information retrieval techniques [9], to automatically annotate documents [10, 11], to protect privacy [12, 13], to match web services [14], and to resolve problems based on knowledge reuse [15]. The method proposed can mine out implicit semantic information without cumbersome lexical analysis by making links express semantic knowledge and pointers record a traversal sequence which describes different abilities of nodes in expressing a text. The method proposed in this paper extracts semantic information by creating tresses and calculates the similarities of coexisting hidden structures to measure the similarities of texts. The other is to generate semantic trees based on the combining of pointers and a fixed traversal strategy and to use subtrees as addenda structures. The last one is to discover implicit knowledge by analyzing semantic trees and mining coexisting hidden structures
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have