Abstract
Advances made to the traditional clustering algorithms solves the various problems such as curse of dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we apply it to high dimensional data it causes the dimensional disaster problem due to high computational complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms improve the performance for clustering high dimension dataset from different aspects in different extent. Still these algorithms will improve the performance form a single perspective. The objective of the proposed model is to improve the performance of traditional H-K clustering and overcome the limitations such as high computational complexity and poor accuracy for high dimensional data by combining the three different approaches of clustering algorithm as subspace clustering algorithm and ensemble clustering algorithm with H-K clustering algorithm.
Highlights
As an important technique in data mining, clustering analysis groups the observations having similar properties which can be called as an unsupervised classification[1] which helps to extract the relevant information from high dimensional data
The proposed model combines the three techniques, subspace clustering, H-K clustering and ensemble clustering and their advantages to improve the performance of clustering result on high dimensional data which will simultaneously overcome the limitations of H-K clustering algorithm for high dimensional data
A lot of work has been done in the area of clustering, based on the research until date, the general categorization for high dimensional data set clustering includes: 1- Dimension reduction, 2- Subspace clustering, 3 - Ensemble Clustering and 4 - H-K clustering [1] [11] [14]
Summary
As an important technique in data mining, clustering analysis groups the observations having similar properties which can be called as an unsupervised classification[1] which helps to extract the relevant information from high dimensional data. Ensemble clustering ‘the knowledge reuse framework’, firstly proposed by Strel and Ghosh [11] is the technique which uses the two mechanisms as generation mechanism which generates the clusters using different criteria and consensus function will choose the most appropriate solution form the set of solutions. It overcome the challenges created by high dimensional data and gives high performance on real world datasets for applications as Internet applications and medical diagnostics [2,3,12,13,19,20]. The proposed model combines the three techniques, subspace clustering, H-K clustering and ensemble clustering and their advantages to improve the performance of clustering result on high dimensional data which will simultaneously overcome the limitations of H-K clustering algorithm for high dimensional data ( as high computational complexity and poor accuracy)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Information Technology Convergence and Services
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.