CLAG: an unsupervised non hierarchical clustering algorithm handling biological data

Linda Dib,Alessandra Carbone

doi:10.1186/1471-2105-13-194

Abstract

BackgroundSearching for similarities in a set of biological data is intrinsically difficult due to possible data points that should not be clustered, or that should group within several clusters. Under these hypotheses, hierarchical agglomerative clustering is not appropriate. Moreover, if the dataset is not known enough, like often is the case, supervised classification is not appropriate either.ResultsCLAG (for CLusters AGgregation) is an unsupervised non hierarchical clustering algorithm designed to cluster a large variety of biological data and to provide a clustered matrix and numerical values indicating cluster strength. CLAG clusterizes correlation matrices for residues in protein families, gene-expression and miRNA data related to various cancer types, sets of species described by multidimensional vectors of characters, binary matrices. It does not ask to all data points to cluster and it converges yielding the same result at each run. Its simplicity and speed allows it to run on reasonably large datasets.ConclusionsCLAG can be used to investigate the cluster structure present in biological datasets and to identify its underlying graph. It showed to be more informative and accurate than several known clustering methods, as hierarchical agglomerative clustering, k-means, fuzzy c-means, model-based clustering, affinity propagation clustering, and not to suffer of the convergence problem proper to this latter.

Highlights

Searching for similarities in a set of biological data is intrinsically difficult due to possible data points that should not be clustered, or that should group within several clusters
Closeness between entries CLAG clusters elements in N according to E and to explain how it does it, we introduce the notion of closeness between pairs of entries
Its range of application is spread as illustrated by the datasets we discussed

Summary

Introduction

Searching for similarities in a set of biological data is intrinsically difficult due to possible data points that should not be clustered, or that should group within several clusters. Clustering of biological data often requires to look for the proximity of few data points within a large dataset with the purpose to group together only those that satisfy the same set of constraints, possibly resulting from the same functional origins, or that have undergone the same evolutionary pressures. This is the case for amino acids in proteins, where one expects few of the residues to account for the structural stability of the protein or for its functional activity.

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Aug 8, 2012
Citations: 40	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

CLAG: an unsupervised non hierarchical clustering algorithm handling biological data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Model-based hierarchical clustering with Bregman divergences and Fishers mixture model: application to depth image analysis
Md Abul Hasnat ... Alain Trémeau
Statistics and Computing | VOL. 26
Md Abul Hasnat, et. al.Md Abul Hasnat ... Alain Trémeau
13 Jun 2015
Statistics and Computing | VOL. 26

An Effective and Efficient Constrained Ward’s Hierarchical Agglomerative Clustering Method
Abeer A Aljohani ... Eran A Edirisinghe
-
Abeer A Aljohani, et. al.Abeer A Aljohani ... Eran A Edirisinghe
24 Aug 2019
24 Aug 2019

A Novel Fuzzy-Based Automatic Speaker Clustering Algorithm
Haipeng Wang ... Hongbin Suo
-
Haipeng Wang, et. al.Haipeng Wang ... Hongbin Suo
01 Jan 2009
01 Jan 2009

Identifying Distinct High Unmet-Need Phenotypes and Their Associated Bladder Cancer Patient Demographic, Clinical, Psychosocial, and Functional Characteristics: Results of Two Clustering Methods.
Nihal E Mohamed ... Diane Quale
Seminars in oncology nursing | VOL. 37
Nihal E Mohamed, et. al.Nihal E Mohamed ... Diane Quale
07 Jan 2021
Seminars in oncology nursing | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CLAG: an unsupervised non hierarchical clustering algorithm handling biological data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics