Abstract

BackgroundBiological data comprises various topologies or a mixture of forms, which makes its analysis extremely complicated. With this data increasing in a daily basis, the design and development of efficient and accurate statistical methods has become absolutely necessary. Specific analyses, such as those related to genome-wide association studies and multi-omics information, are often aimed at clustering sub-conditions of cancers and other diseases. Hierarchical clustering methods, which can be categorized into agglomerative and divisive, have been widely used in such situations. However, unlike agglomerative methods divisive clustering approaches have consistently proved to be computationally expensive.ResultsThe proposed clustering algorithm (DRAGON) was verified on mutation and microarray data, and was gauged against standard clustering methods in the literature. Its validation included synthetic and significant biological data. When validated on mixed-lineage leukemia data, DRAGON achieved the highest clustering accuracy with data of four different dimensions. Consequently, DRAGON outperformed previous methods with 3-,4- and 5-dimensional acute leukemia data. When tested on mutation data, DRAGON achieved the best performance with 2-dimensional information.ConclusionsThis work proposes a computationally efficient divisive hierarchical clustering method, which can compete equally with agglomerative approaches. The proposed method turned out to correctly cluster data with distinct topologies. A MATLAB implementation can be extraced from http://www.riken.jp/en/research/labs/ims/med_sci_math/ or http://www.alok-ai-lab.com

Highlights

  • Biological data comprises various topologies or a mixture of forms, which makes its analysis extremely complicated

  • Divisive procedures, which start with the entire dataset, are in general considered safer than agglomerative approaches [21, 23]

  • The divisive procedure has not been generally used for hierarchical clustering, remaining largely ignored in the literature

Read more

Summary

Introduction

Biological data comprises various topologies or a mixture of forms, which makes its analysis extremely complicated With this data increasing in a daily basis, the design and development of efficient and accurate statistical methods has become absolutely necessary. On the other hand, perform clustering in an inverse way as compared to their agglomerative counterparts They begin by considering a group (having all the samples) and divide it into two groups at each stage until all the groups comprise of only a single sample [21, 22]. False decisions made in early stages cannot be corrected later on For this reason, divisive procedures, which start with the entire dataset, are in general considered safer than agglomerative approaches [21, 23]. The divisive procedure has not been generally used for hierarchical clustering, remaining largely ignored in the literature

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.