Abstract
With the rapid expansion of network information and the emergence of a large number of electronic texts, how to organize and manage this massive information has become a major challenge. Automatic text categorization technology is to study how to let the machine classify unknown text through self-learning, thus solving the difficulties encountered in manual classification. Because granular computing can reduce the knowledge in solving complex problems, it is more convenient to summarize and acquire knowledge. It has become a hotspot in recent years, and it also provides new ideas for text classification research. The rough set model of granular computing can acquire knowledge by mining decision rules. The decision process is more transparent and easy to understand. It has been paid attention to and applied in text classification research. Based on the research of existing achievements, this paper makes a further study on the application of granular computing in text categorization. After analyzing the existing feature selection methods, the feature distribution is proposed based on the relationship between feature words and categories. By calculating the distribution distance between any two feature words, the feature words with similar distribution distances are aggregated, which effectively reduces the dimension of the feature space and also avoids the individual samples caused by the existing feature selection algorithm. A phenomenon that is discarded due to features. The experimental results show that the clustering method can obtain higher classification accuracy than other feature selection methods when using SVM as the classifier. SVM performs best, and the final text classification accuracy rate can reach 85.46%. According to the correlation principle of the rough set, feature selection is made for each information granularity, the selected feature is used as the condition attribute and the coordination matrix is constructed, and the most similar sample is heuristically searched to obtain the attribute reduction set.
Highlights
The rapid development of information technology, especially the development of the Internet, has brought people into the era of information exchange
In order to solve the problem of complex data, difficult operation, cumbersome recognition process, and incomplete feature extraction in a text categorization method, this paper studies the text classification method based on granular algorithm
Feature words are aggregated, which effectively reduces the dimension of the feature space and avoids the phenomenon that individual samples caused by the existing feature selection algorithm are discarded because they do not contain the selected features; the clustering method can be obtained when using support vector machine (SVM) as the classifier has higher classification accuracy than other feature selection methods
Summary
The rapid development of information technology, especially the development of the Internet, has brought people into the era of information exchange. The method of granular computing can reduce the dimension of knowledge when solving complex problems, which makes it easier to generalize and acquire knowledge It has become a hotspot in recent years, and it provides new ideas for the study of text classification. In order to solve the problem of complex data, difficult operation, cumbersome recognition process, and incomplete feature extraction in a text categorization method, this paper studies the text classification method based on granular algorithm. Feature words are aggregated, which effectively reduces the dimension of the feature space and avoids the phenomenon that individual samples caused by the existing feature selection algorithm are discarded because they do not contain the selected features; the clustering method can be obtained when using SVM as the classifier has higher classification accuracy than other feature selection methods
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have