Abstract

A novel semi-supervised classification algorithm for short text based on fusion similarity is presented via analyzing of existing defects of short text classification algorithm. First of all, some words with the ability of indication of the category are extracted from the labeled dataset to construct a strong category features set. A valid fusion similarity measurement method is designed by combining cosine theorem and strong category features based similarity. Secondly, computing the mean value of the supervised information, and determining the virtual class center point of each class, and then finding the real class center point. Finally, we search those texts which have the highest similarity with each real class center in the unlabeled dataset, and give it the same class label with the real class center point. At the same time, we add it to the labeled collection, update the strong category features set and the similarity matrix. Repeat this process until all short texts have been labeled. Ultimately, experiments show that our method can significantly improve the efficiency of short text classification. The text of the most similarity with the center of the class.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call