Research of specific field ultra-short text classification based on collaborative filtering algorithm

Weichen Yang,Yanwei Si,Nader Asnafi

doi:10.1051/matecconf/201818903006

Weichen Yang, Yanwei Si + Show 1 more

Open Access

https://doi.org/10.1051/matecconf/201818903006

Copy DOI

Abstract

In some specific fields, there are a lot of ultra-short texts that need to be categorized. This paper proposes an ultra-short text classification method based on collaborative filtering algorithm aiming at the problems such as short text content, short length, sparse features, and large number of categories in certain fields. First, converting ultra-short text into word frequency vector by doing Chinese word segmentation and calculating word frequency; Secondly, combining relevant data in specific fields, defining the ultra-short texts as users, categories as items, and then constructing a user-item recommendation matrix. Finally, calculating text similarity by using cosine similarity method and obtaining the classification results. The experimental results show that the proposed method can well solve the problem of classification of ultra-short texts in specific fields, and the average accuracy is 9.19% and 3.81% higher than vector space model and topic similarity method respectively.

Highlights

In recent years, with the advent of the web2.0 era, a large number of short text web data are generated on the internet[1]
Since the classification problem in short text classification can be converted into a recommendation problem, based on the collaborative filtering model, this paper proposes a mixture recommendation model based on relevant data in specific fields
The user and the item information is constructed into a recommendation matrix, the similarity value of the short text is calculated according to the cosine similarity, and the classification result is obtained in combination with the data to be classified

Summary

Introduction

With the advent of the web2.0 era, a large number of short text web data are generated on the internet[1]. The classification of these data, and how to obtain the key information from the text more quickly and accurately, have become the key issues in current data mining research. In these network data, there are some ultra-short text data in certain specific fields. Aiming at the problems such as few words, sparse characters and many kinds of categories in ultra-short texts, this paper proposes a new ultrashort text classification method that combines the special relevant data features in some. The structure of this paper is as follows: section 2 introduces the research status of short text classification, section 3 describes the collaborative filtering algorithm model, and section 4 proposes a short text classification based on collaborative filtering.

Related works

Model of collaborative filtering

Cosine similarity metrics

Ultra-short text classification based on collaborative filtering

Similarity calculation of ultra short text

Ultra-short text classification algorithm

Experiment

Pre-processing

Experimental evaluation

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: MATEC Web of Conferences	Publication Date: Jan 1, 2018
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Research of specific field ultra-short text classification based on collaborative filtering algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences

Lead the way for us

Similar Papers

Rating Prediction on Movie Recommendation System: Collaborative Filtering Algorithm (CFA) vs. Dissymetrical Percentage Collaborative Filtering Algorithm (DSPCFA)
Johan Eko Purnomo ... Sukmawati Nur Endah
-
Johan Eko Purnomo, et. al.Johan Eko Purnomo ... Sukmawati Nur Endah
01 Oct 2019
01 Oct 2019

Tourist Places Recommender System Using Cosine Similarity and Singular Value Decomposition Methods
Theriana Ayu Waskitaning Tyas ... Z K Abdurahman Baizal
Jurnal media informatika Budidarma | VOL. 5
Theriana Ayu Waskitaning Tyas, et. al.Theriana Ayu Waskitaning Tyas ... Z K Abdurahman Baizal
26 Oct 2021
Jurnal media informatika Budidarma | VOL. 5

Prediction of virus-host infectious association by supervised learning methods
Mengge Zhang ... Fengzhu Sun
BMC bioinformatics | VOL. 18
Mengge Zhang, et. al.Mengge Zhang ... Fengzhu Sun
01 Mar 2017
BMC bioinformatics | VOL. 18

Latent Semantic Analysis (LSA) based object recognition and clustering
Vinaykumar Hebballi ... Vidhu Rojit
-
Vinaykumar Hebballi, et. al.Vinaykumar Hebballi ... Vidhu Rojit
01 Oct 2015
01 Oct 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Research of specific field ultra-short text classification based on collaborative filtering algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: MATEC Web of Conferences