Abstract

Clustering is a widely used to discover underlying patterns and groups in data and there is a need to validate the quality of clusters generated by the numerous clustering algorithms in use. The need for cluster validitation arises from the fundamental definition of unsupervised learning. As clustering is an unsupervised learning process, the prediction of correct number of clusters is a hurdle which can be cleared by using cluster validity indices to assess the quality of the clusters. We have developed a tool for cluster validation as a part of GOAPhAR, a web based tool that integrates from disparate sources, information regarding gene annotations, protein annotations, identifiers associated with probe sets, functional pathways, protein interactions, gene Ontology and publicly available microarray datasets. Our cluster validity tool calculates three indices to indicate clustering quality viz. the Silhouette, Dunn's and Davies-Bouldin indices and outputs them to the user. The values of these indices can be used to judge the quality of clustering and to optimize the process of selecting an appropriate clustering algorithm and number of clusters. The tool is freely available at http://bioinformatics.kumc.edu/goaphar/

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.