Abstract

Outlier detection is an important data mining task with many contemporary applications. Clustering based methods for outlier detection try to identify the data objects that deviate from the normal data. However, the uncertainty regarding the cluster membership of an outlier object has to be handled appropriately during the clustering process. Additionally, carrying out the clustering process on data described using categorical attributes is challenging, due to the difficulty in defining requisite methods and measures dealing with such data. Addressing these issues, a novel algorithm for clustering categorical data aimed at outlier detection is proposed here by modifying the standard $$k$$k-modes algorithm. The uncertainty regarding the clustering process is addressed by considering a soft computing approach based on rough sets. Accordingly, the modified clustering algorithm incorporates the lower and upper approximation properties of rough sets. The efficacy of the proposed rough $$k$$k-modes clustering algorithm for outlier detection is demonstrated using various benchmark categorical data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call