Abstract

Clustering is an unsupervised Machine Learning technique widely used to arrange a set of observations into distinct groups called clusters. The problem of categorical clustering has attracted much attention since many real world applications tend to produce such data types. The k-mode was among the first algorithms developed in this context. This algorithms uses the notion of modes to represent the centroids within the clusters. However, its major drawback lies in the random selection of the modes in each iteration during the clustering process. In this paper, we tackled this random selection issue and proposed a new method based on identifying the most adequate modes among a list of candidate ones. The proposed algorithm called Density Rough k-modes (DRk-M) is based on computing the density of each candidate mode to characterize the distribution of the observations around it. Then, we use the Rough Set Theory to deal with the uncertainty involved in this process. The DRk-M was experimented using real world datasets extracted from the UCI (University of California Irvine) Machine Learning Repository, the Global Terrorism Database (GTD) and a set of scrapped Tweets. The DRk-M was compared to many state of the art methods including the k-modes (1998), the Ng’s method (2007), Cao’s method (2012) and Bai’s technique (2014) and it has shown great efficiency.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.