A Novel Short Text Clustering Model Based on Grey System Theory

Hüseyin Fidan,Mehmet Erkan Yuksel

doi:10.1007/s13369-019-04191-0

Abstract

Short text clustering has great challenges due to the structural reasons, especially when applied to small datasets. Limited number of words leads to a poor-quality feature vector, low clustering accuracy, and failure of analysis. Although some approaches have been observed in the related literature, there is still no agreement on an efficient solution. On the other hand, the Grey system theory, which gives better results in numerical analyses with insufficient data, has not yet been applied to short text clustering. The purpose of our study is to develop a short text clustering model based on Grey system theory applicable to small datasets. In order to measure the efficiency of our method, book reviews labeled as negative or positive were obtained from Amazon.com dataset collections, and small datasets have been created. The Grey relational clustering as well as hierarchical and partitional algorithms has been applied to the small datasets separately. According to the results, our model has better accuracy values than the other algorithms in clustering of small datasets containing short text. Consequently, we demonstrated that the Grey relational clustering should be applied to short text clustering for much better results.

Full Text