Abstract

Outlier detection is an important requirement in data mining and machine learning. When data mining and machine learning algorithms are applied on the datasets with outliers, it leads to erroneous conclusion about the data. Therefore, researchers have been working in this field to remove outliers from dataset so that meaningful information from the datasets can be retrieved. In this paper, we take a cluster based ensemble approach for outlier detection, the backbone of which are some conventional clustering algorithms. Keeping in mind the drawbacks of supervised and semi supervised learning, we have relied on unsupervised learning algorithms. For our cluster based ensemble approach, we use three clustering algorithms, namely K-means, K-means++, and Fuzzy C-means. Our model intelligently combines results from individual clustering algorithms, assigning probabilities to each data point in order to decide its belongingness to a certain cluster. We have proposed a technique to assign a membership value to a data point in case of hard clustering algorithms, as we want to keep the flexibility of combining hard and soft clustering algorithms. From the probabilities assigned by the ensemble model, we then identify the outliers from the dataset. After removing these data points from the dataset, we obtain better values of cluster validity indices, thus reaffirming that removal of outliers has resulted in more stringent clusters of data. We have used five different cluster validity indices in our work to measure the goodness of the clusters formed, considering eight widely used datasets for evaluation of the proposed model amongst which three are large datasets. We have noticed a significant improvement in the cluster validity indices after applying our outlier detection algorithm. The experimental results prove that the proposed method is empirically sound.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call