Abstract

The diversities of large-scale semistructured data make the extraction of implicit semantic information have enormous difficulties. This paper proposes an automatic and unsupervised method of text categorization, in which tree-shape structures are used to represent semantic knowledge and to explore implicit information by mining hidden structures without cumbersome lexical analysis. Mining implicit frequent structures in trees can discover both direct and indirect semantic relations, which largely enhances the accuracy of matching and classifying texts. The experimental results show that the proposed algorithm remarkably reduces the time and effort spent in training and classifying, which outperforms established competitors in correctness and effectiveness.

Highlights

  • Rapid developmental trend in social network means the explosive growth of users as well as dramatic changes in providing services

  • Zaki and Aggarwal [4] propose a structural rule-based classifier for semistructured data, called XMiner, which can mine out parent-child frequent branches and ancestor-descendant ones and conduct structured or semistructured data perfectly, but the shortness is the lack of semantic information in text representation

  • Semantic similarity assessment [7, 8] can be exploited to improve the accuracy of current information retrieval techniques [9], to automatically annotate documents [10, 11], to protect privacy [12, 13], to match web services [14], and to resolve problems based on knowledge reuse [15]

Read more

Summary

Introduction

Rapid developmental trend in social network means the explosive growth of users as well as dramatic changes in providing services. Semantic similarity assessment [7, 8] can be exploited to improve the accuracy of current information retrieval techniques [9], to automatically annotate documents [10, 11], to protect privacy [12, 13], to match web services [14], and to resolve problems based on knowledge reuse [15]. The method proposed can mine out implicit semantic information without cumbersome lexical analysis by making links express semantic knowledge and pointers record a traversal sequence which describes different abilities of nodes in expressing a text. The method proposed in this paper extracts semantic information by creating tresses and calculates the similarities of coexisting hidden structures to measure the similarities of texts. The other is to generate semantic trees based on the combining of pointers and a fixed traversal strategy and to use subtrees as addenda structures. The last one is to discover implicit knowledge by analyzing semantic trees and mining coexisting hidden structures

Representation of Semantic Information
Mining Implicit Frequent Structures
Scoring Tactics
Experiment
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call