Abstract

MCOKE algorithm in identifying data objects to multi cluster is known for its simplicity and effectiveness. Its drawback is the use of maxdist as a global threshold in assigning objects to one or more cluster while it is sensitive to outliers. Having outliers in the datasets can significantly affect the effectiveness of maxdist as regards to overlapping clustering. In this paper, the outlier detection is incorporated in MCOKE algorithm so that it can detect and remove outliers that can participate in the calculation of assigning objects to one or more clusters. The improved MCOKE algorithm provides better identification of overlapping clustering results. The performance was evaluated via F1 score performance criterion. Evaluation results revealed that the outlier detection demonstrated higher accuracy rate in identifying abnormal data (outliers) when applied to real datasets .

Highlights

  • Data mining is the method of extracting patterns from data [1]

  • Based from the above experiment, incorporating outlier detection in MCOKE algorithm provides better identification of overlapping clustering results while having outliers in the datasets affects the effectiveness of MCOKE in identifying the belonging of objects to multi-clusters

  • The proposed method achieved 71% accuracy rate in identifying outliers, whereas the existing outlier detection were lower than 50%

Read more

Summary

Introduction

Data mining is the method of extracting patterns from data [1]. Data mining is the most important part of KDD (Knowledge Discovery in Database) process to find meaningful information and discover new patterns from the massive collection of data [2]. The identification of these patterns is used to mine variety of information which is used in numerous application [3]. Data clustering can be considered one of the most important and challenging data mining techniques in the knowledge discovery process. It is a machine learning tool which is widely used to detect hidden structure or to outline the data category in several domains such as biology, system engineering and social sciences [4], [5]. Clustering, as to the unsupervised learning technique, aims to find groups of similar patterns within the same cluster and dissimilar patterns from different cluster [6]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.