Abstract

In some specific fields, there are a lot of ultra-short texts that need to be categorized. This paper proposes an ultra-short text classification method based on collaborative filtering algorithm aiming at the problems such as short text content, short length, sparse features, and large number of categories in certain fields. First, converting ultra-short text into word frequency vector by doing Chinese word segmentation and calculating word frequency; Secondly, combining relevant data in specific fields, defining the ultra-short texts as users, categories as items, and then constructing a user-item recommendation matrix. Finally, calculating text similarity by using cosine similarity method and obtaining the classification results. The experimental results show that the proposed method can well solve the problem of classification of ultra-short texts in specific fields, and the average accuracy is 9.19% and 3.81% higher than vector space model and topic similarity method respectively.

Highlights

  • In recent years, with the advent of the web2.0 era, a large number of short text web data are generated on the internet[1]

  • Since the classification problem in short text classification can be converted into a recommendation problem, based on the collaborative filtering model, this paper proposes a mixture recommendation model based on relevant data in specific fields

  • The user and the item information is constructed into a recommendation matrix, the similarity value of the short text is calculated according to the cosine similarity, and the classification result is obtained in combination with the data to be classified

Read more

Summary

Introduction

With the advent of the web2.0 era, a large number of short text web data are generated on the internet[1]. The classification of these data, and how to obtain the key information from the text more quickly and accurately, have become the key issues in current data mining research. In these network data, there are some ultra-short text data in certain specific fields. Aiming at the problems such as few words, sparse characters and many kinds of categories in ultra-short texts, this paper proposes a new ultrashort text classification method that combines the special relevant data features in some. The structure of this paper is as follows: section 2 introduces the research status of short text classification, section 3 describes the collaborative filtering algorithm model, and section 4 proposes a short text classification based on collaborative filtering.

Related works
Model of collaborative filtering
Cosine similarity metrics
Ultra-short text classification based on collaborative filtering
Similarity calculation of ultra short text
Ultra-short text classification algorithm
Experiment
Pre-processing
Experimental evaluation
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call