Abstract

Considering the explosive growth of data, the increased amount of text data’s effect on the performance of text categorization forward the need for higher requirements, such that the existing classification method cannot be satisfied. Based on the study of existing text classification technology and semantics, this paper puts forward a kind of Chinese text classification oriented SAW (Structural Auxiliary Word) algorithm. The algorithm uses the special space effect of Chinese text where words have an implied correlation between text information mining and text categorization for high-correlation matching. Experiments show that SAW classification algorithm on the premise of ensuring precision in classification, significantly improve the classification precision and recall, obviously improving the performance of information retrieval, and providing an effective means of data use in the era of big data information extraction.

Highlights

  • With the rapid development of information technology, all kinds of data information are growing rapidly

  • In order to solve this problem, this paper proposed a novel text classification algorithm according to the characteristics of Chinese grammar—SAW (Structural Auxiliary Word) classification algorithm, based on Chinese text classification

  • The biggest difference between SAW classification algorithm and the previous classification algorithm is that the algorithm is weighted by relevance weighting SAW-Model sort of text-related degrees, so that when the text is retrieved, it is easy to find the greatest relevancy with the search entry text

Read more

Summary

Introduction

With the rapid development of information technology, all kinds of data information are growing rapidly. 40 ZB in 2020, of which text data accounts for about 80%, how to effectively manage text information, and solving problems, such as the development of automatic text classification technology, are emerging research topics. Automatic text classification technology can achieve effective classification and extraction of text data; at the same time, it can improve the utilization rate of text data and precision of retrieval, and so on. Compared with the work of text representation and classification model, there is relatively more research on VSM, mainly focused on the introduction and improvement on the related research results of the machine learning field. The classification performance and usability are better than previous knowledge engineering methods, there are still some problems, such as slow classification speed and precision This is because most text classification methods are a simple classification from the perspective of text matching, but neglect the practical significance of the word itself or the semantics, leading to low text matching efficiency. The four experiments will be described in detailed in part 4

Related Work
Algorithm Idea
Process of Classification
Pre-Processing
Calculating Entries Weighting
Relevance Weighting Model—SAW-Model
The First Level of relevancy
The Second Level of Relevancy
Same Class Relevancy Correction Value Vs
Experiment and Analyses
Performance Measure
The First Experiment
The Second Experiment
The Third Experiment
The Experiment of SAW Classification Algorithm
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.