Abstract

BackgroundA well-known problem in cluster analysis is finding an optimal number of clusters reflecting the inherent structure of the data. PFClust is a partitioning-based clustering algorithm capable, unlike many widely-used clustering algorithms, of automatically proposing an optimal number of clusters for the data.ResultsThe results of tests on various types of data showed that PFClust can discover clusters of arbitrary shapes, sizes and densities. The previous implementation of the algorithm had already been successfully used to cluster large macromolecular structures and small druglike compounds. We have greatly improved the algorithm by a more efficient implementation, which enables PFClust to process large data sets acceptably fast.ConclusionsIn this paper we present a new optimized implementation of the PFClust algorithm that runs considerably faster than the original.

Highlights

  • Cluster analysis [1] comprises methods designed to find structure in a dataset

  • One of the main challenges introduced by the lack of class labels is determining an optimal number of clusters that reflect the inherent structure present in the data

  • We have developed a novel clustering technique called PFClust [9] that automatically discovers an optimum partitioning of the data without requiring prior knowledge of the number of clusters

Read more

Summary

Introduction

Cluster analysis [1] comprises methods designed to find structure in a dataset. Data can be divided into clusters that help us understand the problem domain, inform ongoing investigation, or form input for other data analysis techniques. * Correspondence: lazaros.mavridis.lm@gmail.com 2EaStCHEM School of Chemistry and Biomedical Sciences Research Complex, University of St Andrews, North Haugh, St Andrews, Scotland KY16 9ST, UK Full list of author information is available at the end of the article Method PFClust consists of two steps: threshold estimation and clustering.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.