Abstract

This paper proposes a weighted k-means clustering algorithm based on k-means (MacQueen, 1967; Anderberg, 1973) algorithm, and it can be used to cluster texts. Firstly, the weighted k-means algorithm changes the descriptive approach of text objects, and converts the categorical attributes to numeric ones to measure the dissimilarity of text objects by Euclidean distance; then, the weighted k-means algorithm uses weight vector to decrease the affects of irrelevant attributes and reflect the semantic information of text objects. Through an experiment, the weighted k-means algorithm is demonstrated to be more effective than k-means algorithm when used to cluster texts.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call