FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets.

Cuong Cao Dang,Vinh Sy Le,Bart Hazes,Quang Si Le,Olivier Gascuel

doi:10.1186/1471-2105-15-341

Abstract

BackgroundAmino acid replacement rate matrices are a crucial component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Ideally, the rate matrix reflects the mutational behavior of the actual data under study; however, estimating amino acid replacement rate matrices requires large protein alignments and is computationally expensive and complex. As a compromise, sub-optimal pre-calculated generic matrices are typically used for protein-based phylogeny. Sequence availability has now grown to a point where problem-specific rate matrices can often be calculated if the computational cost can be controlled.ResultsThe most time consuming step in estimating rate matrices by maximum likelihood is building maximum likelihood phylogenetic trees from protein alignments. We propose a new procedure, called FastMG, to overcome this obstacle. The key innovation is the alignment-splitting algorithm that splits alignments with many sequences into non-overlapping sub-alignments prior to estimating amino acid replacement rates. Experiments with different large data sets showed that the FastMG procedure was an order of magnitude faster than without splitting. Importantly, there was no apparent loss in matrix quality if an appropriate splitting procedure is used.ConclusionsFastMG is a simple, fast and accurate procedure to estimate amino acid replacement rate matrices from large data sets. It enables researchers to study the evolutionary relationships for specific groups of proteins or taxa with optimized, data-specific amino acid replacement rate matrices. The programs, data sets, and the new mammalian mitochondrial protein rate matrix are available at http://fastmg.codeplex.com.

Highlights

Amino acid replacement rate matrices are a crucial component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference
The maximum likelihood methods have been designed to fully utilize the information contained in multiple protein alignments and the corresponding phylogenetic trees which must be estimated from the data [6,7,8]
A fully automated maximum likelihood estimation procedure was proposed and used to estimate matrices from different data sets [8,10,11]. It consists of two main steps: building maximum likelihood phylogenetic trees and estimating parameters based on the information contained in multiple protein alignments and the corresponding phylogenetic trees

Summary

Results

The most time consuming step in estimating rate matrices by maximum likelihood is building maximum likelihood phylogenetic trees from protein alignments. We propose a new procedure, called FastMG, to overcome this obstacle. The key innovation is the alignment-splitting algorithm that splits alignments with many sequences into non-overlapping sub-alignments prior to estimating amino acid replacement rates. Experiments with different large data sets showed that the FastMG procedure was an order of magnitude faster than without splitting. There was no apparent loss in matrix quality if an appropriate splitting procedure is used

Conclusions

Background

Results and discussion

Methods

Felsenstein J

27. Gascuel O

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Oct 24, 2014
Citations: 37	License type: cc-by

R Discovery Prime

R Discovery Prime

FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Pairwise Statistical Significance of Local Sequence Alignment Using Substitution Matrices with Sequence-Pair-Specific Distance
Ankit Agrawal ... Xiaoqiu Huang
-
Ankit Agrawal, et. al.Ankit Agrawal ... Xiaoqiu Huang
01 Dec 2008
01 Dec 2008

Maximum likelihood estimators for scaled mutation rates in an equilibrium mutation–drift model
Claus Vogl ... Conrad J Burden
Theoretical Population Biology | VOL. 134
Claus Vogl, et. al.Claus Vogl ... Conrad J Burden
18 Jun 2020
Theoretical Population Biology | VOL. 134

Factors affecting the errors in the estimation of evolutionary distances between sequences.
D C Hoyle ... P G Higgs
Molecular Biology And Evolution | VOL. 20
D C Hoyle, et. al.D C Hoyle ... P G Higgs
01 Jan 2003
Molecular Biology And Evolution | VOL. 20

Avoiding matrix exponentials for large transition rate matrices.
Pedro Pessoa ... Max Schweiger
The Journal of Chemical Physics | VOL. 160
Pedro Pessoa, et. al.Pedro Pessoa ... Max Schweiger
04 Mar 2024
The Journal of Chemical Physics | VOL. 160

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics