Abstract
Clustering large volumes of high-dimensional data is a challenging task. Many clustering algorithms have been developed to address either handling datasets with a very large sample size or with a very high number of dimensions, but they are often impractical when the data is large in both aspects. To simultaneously overcome both the ‘curse of dimensionality’ problem due to high dimensions and scalability problems due to large sample size, we propose a new fast clustering algorithm called FensiVAT. FensiVAT is a hybrid, ensemble-based clustering algorithm which uses fast data-space reduction and an intelligent sampling strategy. In addition to clustering, FensiVAT also provides visual evidence that is used to estimate the number of clusters (cluster tendency assessment) in the data. In our experiments, we compare FensiVAT with nine state-of-the-art approaches which are popular for large sample size or high-dimensional data clustering. Experimental results suggest that FensiVAT, which can cluster large volumes of high-dimensional datasets in a few seconds, is the fastest and most accurate method of the ones tested.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Knowledge and Data Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.