Partitioning clustering algorithms for protein sequence data sets

Sondes Fayech,Mohamed Limam,Nadia Essoussi

doi:10.1186/1756-0381-2-3

Sondes Fayech, Mohamed Limam + Show 1 more

Open Access

https://doi.org/10.1186/1756-0381-2-3

Copy DOI

Journal: BioData Mining	Publication Date: Apr 2, 2009
Citations: 45	License type: CC BY 2.0

Affiliation: Tunis University

Abstract

BackgroundGenome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these data into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs to automatically and accurately classify sequences into families become a necessity. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods in literature, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few applications have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence data and if these methods can be efficient compared to the published clustering methods.MethodsWe developed four partitioning clustering approaches using Smith-Waterman local-alignment algorithm to determine pair-wise similarities of sequences. Four different sets of protein sequences were used as evaluation data sets for the proposed methods.ResultsWe show that these methods outperform several other published clustering methods in terms of correctly predicting a classifier and especially in terms of the correctness of the provided prediction. The software is available to academic users from the authors upon request.

Highlights

Genome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases
Smith and Waterman local alignment algorithm [3] helps in finding conserved amino acid patterns in protein sequences
The main idea here is to design and develop efficient clustering algorithms based on partitioning techniques, which are not very investigated in protein sequence clustering field, in order to cluster large sets of protein sequences

Summary

Introduction

Genome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Partitioning clustering techniques are extremely used in other fields, few applications have been found in the field of protein sequence clustering. Approaches of comparing and grouping protein sequences are alignment methods. Pair-wise alignment is used to compare and to cluster sequences. There are two types of pair-wise sequence alignments, local and global [1,2]. In order to cluster a large data set of proteins into meaningful clusters, the pair-wise alignment is computationally expensive because of the large number of comparisons carried out. Each protein of the data set should be compared to all others of the data set

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Partitioning clustering algorithms for protein sequence data sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioData Mining

Lead the way for us

Similar Papers

A survey on the utilization of Superpixel image for clustering based image segmentation.
Buddhadev Sasmal ... Krishna Gopal Dhal
Multimedia Tools and Applications | VOL. 82
Buddhadev Sasmal, et. al.Buddhadev Sasmal ... Krishna Gopal Dhal
08 Mar 2023
Multimedia Tools and Applications | VOL. 82

Comprehensive Study and Analysis of Partitional Data Clustering Techniques
Aparna K ... Mydhili K Nair
International Journal of Business Analytics | VOL. 2
Aparna K, et. al. Aparna K ... Mydhili K Nair
01 Jan 2015
International Journal of Business Analytics | VOL. 2

Multivariate Analysis of LTE Radio-Layer Parameters based on a Partitional Clustering Approach
Nicola Pasquino ... Giorgio Ventre
-
Nicola Pasquino, et. al.Nicola Pasquino ... Giorgio Ventre
01 Jun 2019
01 Jun 2019

Predicting protein-binding RNA nucleotides with consideration of binding partners
Narankhuu Tuvshinjargal ... Kyungsook Han
Computer Methods and Programs in Biomedicine | VOL. 120
Narankhuu Tuvshinjargal, et. al.Narankhuu Tuvshinjargal ... Kyungsook Han
08 Apr 2015
Computer Methods and Programs in Biomedicine | VOL. 120

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Partitioning clustering algorithms for protein sequence data sets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioData Mining