Is Over-parameterization a Problem for Profile Mixture Models?

Hector Baños,Andrew J Roger,Edward Susko

doi:10.1093/sysbio/syad063

Abstract

Biochemical constraints on the admissible amino acids at specific sites in proteins lead to heterogeneity of the amino acid substitution process over sites in alignments. It is well known that phylogenetic models of protein sequence evolution that do not account for site heterogeneity are prone to long-branch attraction (LBA) artifacts. Profile mixture models were developed to model heterogeneity of preferred amino acids at sites via a finite distribution of site classes each with a distinct set of equilibrium amino acid frequencies. However, it is unknown whether the large number of parameters in such models associated with the many amino acid frequency vectors can adversely affect tree topology estimates because of over-parameterization. Here, we demonstrate theoretically that for long sequences, over-parameterization does not create problems for estimation with profile mixture models. Under mild conditions, tree, amino acid frequencies, and other model parameters converge to true values as sequence length increases, even when there are large numbers of components in the frequency profile distributions. Because large sample theory does not necessarily imply good behavior for shorter alignments we explore the performance of these models with short alignments simulated with tree topologies that are prone to LBA artifacts. We find that over-parameterization is not a problem for complex profile mixture models even when there are many amino acid frequency vectors. In fact, simple models with few site classes behave poorly. Interestingly, we also found that misspecification of the amino acid frequency vectors does not lead to increased LBA artifacts as long as the estimated cumulative distribution function of the amino acid frequencies at sites adequately approximates the true one. In contrast, misspecification of the amino acid exchangeability rates can severely negatively affect parameter estimation. Finally, we explore the effects of including in the profile mixture model an additional "F-class" representing the overall frequencies of amino acids in the data set. Surprisingly, the F-class does not help parameter estimation significantly and can decrease the probability of correct tree estimation, depending on the scenario, even though it tends to improve likelihood scores.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Systematic Biology	Publication Date: Oct 16, 2023
Citations: 3	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Is Over-parameterization a Problem for Profile Mixture Models?

Abstract

Published Version

Talk to us

Similar Papers

More From: Systematic Biology

Lead the way for us

Similar Papers

Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model.
Nicolas Lartillot ... Henner Brinkmann
BMC Evolutionary Biology | VOL. Suppl 7 1
Nicolas Lartillot, et. al.Nicolas Lartillot ... Henner Brinkmann
01 Jan 2007
BMC Evolutionary Biology | VOL. Suppl 7 1

A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny
Huai-Chun Wang ... Karen Li
BMC Evolutionary Biology | VOL. 8
Huai-Chun Wang, et. al.Huai-Chun Wang ... Karen Li
01 Dec 2008
BMC Evolutionary Biology | VOL. 8

Molecular phylogeny of acariform mites (Acari, Arachnida): Strong conflict between phylogenetic signal and long-branch attraction artifacts
Miroslawa Dabert ... Andrzej Kazmierski
Molecular Phylogenetics and Evolution | VOL. 56
Miroslawa Dabert, et. al.Miroslawa Dabert ... Andrzej Kazmierski
06 Jan 2010
Molecular Phylogenetics and Evolution | VOL. 56

Rooting for the root of elongation factor-like protein phylogeny
Ryoma Kamikawa ... Yuji Inagaki
Molecular Phylogenetics and Evolution | VOL. 56
Ryoma Kamikawa, et. al.Ryoma Kamikawa ... Yuji Inagaki
05 May 2010
Molecular Phylogenetics and Evolution | VOL. 56

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Is Over-parameterization a Problem for Profile Mixture Models?

Abstract

Published Version

Talk to us

Similar Papers

More From: Systematic Biology