A singular value decomposition approach for improved taxonomic classification of biological sequences

Anderson R Santos,Jan Baumbach,Artur Silva,Anderson Miyoshi,John A McCulloch,Marcos A Santos,Vasco Azevedo,Guilherme C Oliveira

doi:10.1186/1471-2164-12-s4-s11

Abstract

BackgroundSingular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in the complex internet environment. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could also facilitate data analysis in this research area.ResultsWe found that SVD applied to amino acid sequences demonstrates relationships and provides a basis for producing clusters and cladograms, demonstrating evolutionary relatedness of species that correlates well with Linnaean taxonomy. The choice of a reasonable number of singular values is crucial for SVD-based studies. We found that fewer singular values are needed to produce biologically significant clusters when SVD is employed. Subsequently, we developed a method to determine the lowest number of singular values and fewest clusters needed to guarantee biological significance; this system was developed and validated by comparison with Linnaean taxonomic classification.ConclusionsBy using SVD, we can reduce uncertainty concerning the appropriate rank value necessary to perform accurate information retrieval analyses. In tests, clusters that we developed with SVD perfectly matched what was expected based on Linnaean taxonomy.

Highlights

Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related
The choice was made by K-Means [11], Expectation Maximization (EM) [12], Adaptive Quality-based Clustering Algorithm (AQBC) [13], K-Medoids [14], and MakeDensityBasedClusterer (MDBC) [15], since there is a statistically well-founded background, they have been widely used, and they are available as free software packages from R [16], Waikato Environment for Knowledge Analysis (WEKA) [15], and the JAVA Machine Learning Library [17]
The K-Means requires that an array of numbers be processed to calculate distances for the creation of clusters. It opens the possibility of including a parameter that defines a fixed number of clusters to be created with the elements in the distance matrix

Summary

Introduction

Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could facilitate data analysis in this research area. We developed a methodology, based on singular value decomposition (SVD), for improved inference of evolutionary relationships between amino acid sequences of different species [1]. SVD produces a revised distance matrix for a set of related elements. The reason we chose this methodology is the proven capacity that SVD has to establish non-obvious, relevant relationships among clustered elements [2][3][4][5], providing a deterministic method for grouping related species. A matrix with a singular value decomposition of matrix A can be made:

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Dec 1, 2011
Citations: 23	License type: cc-by

R Discovery Prime

R Discovery Prime

A singular value decomposition approach for improved taxonomic classification of biological sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry
Lukas Reiter ... Ruedi Aebersold
Molecular & Cellular Proteomics | VOL. 8
Lukas Reiter, et. al.Lukas Reiter ... Ruedi Aebersold
01 Nov 2009
Molecular & Cellular Proteomics | VOL. 8

Tomato Functional Genomics Database: a comprehensive resource and analysis package for tomato functional genomics
Z Fei ... X Tang
Nucleic Acids Research | VOL. 39
Z Fei, et. al.Z Fei ... X Tang
21 Oct 2010
Nucleic Acids Research | VOL. 39

Comparison Study on SVD-Based Face Classification
...
-
, et. al. ...
18 Dec 2006
18 Dec 2006

<strong>Proposal of an integrated framework of biological taxonomy: a phylogenetic taxonomy, with the method of using names with standard endings in clade nomenclature</strong>
Shun-Ichiro Naomi
Bionomina | VOL. 7
Shun-Ichiro NaomiShun-Ichiro Naomi
06 Jun 2014
Bionomina | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A singular value decomposition approach for improved taxonomic classification of biological sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics