K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets.

Alberto Pessia,Jukka Corander,Sarah Cobey,Juha Santeri Puranen,Yonatan Grad

doi:10.1099/mgen.0.000025

Abstract

The recent growth in publicly available sequence data has introduced new opportunities for studying microbial evolution and spread. Because the pace of sequence accumulation tends to exceed the pace of experimental studies of protein function and the roles of individual amino acids, statistical tools to identify meaningful patterns in protein diversity are essential. Large sequence alignments from fast-evolving micro-organisms are particularly challenging to dissect using standard tools from phylogenetics and multivariate statistics because biologically relevant functional signals are easily masked by neutral variation and noise. To meet this need, a novel computational method is introduced that is easily executed in parallel using a cluster environment and can handle thousands of sequences with minimal subjective input from the user. The usefulness of this kind of machine learning is demonstrated by applying it to nearly 5000 haemagglutinin sequences of influenza A/H3N2.Antigenic and 3D structural mapping of the results show that the method can recover the major jumps in antigenic phenotype that occurred between 1968 and 2013 and identify specific amino acids associated with these changes. The method is expected to provide a useful tool to uncover patterns of protein evolution.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Microbial Genomics	Publication Date: Jul 15, 2015
Citations: 42	License type: CC BY-NC 3.0

R Discovery Prime

R Discovery Prime

K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets.

Abstract

Talk to us

Similar Papers

More From: Microbial Genomics

Lead the way for us

Similar Papers

Confirming the phylogeny of mammals by use of large comparative sequence data sets.
Arjun B Prasad ... Marc W Allard
Molecular Biology and Evolution | VOL. 25
Arjun B Prasad, et. al.Arjun B Prasad ... Marc W Allard
02 May 2008
Molecular Biology and Evolution | VOL. 25

DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment.
Erik S Wright
BMC Bioinformatics | VOL. 16
Erik S WrightErik S Wright
06 Oct 2015
BMC Bioinformatics | VOL. 16

Amino acid impact factor.
C K Sruthi ... Alexandre G De Brevern
PloS one | VOL. 13
C K Sruthi, et. al.C K Sruthi ... Alexandre G De Brevern
13 Jun 2018
PloS one | VOL. 13

Author response: COVID-19 CG enables SARS-CoV-2 mutation and lineage tracking by locations and dates of interest
Albert Tian Chen ... Shing Hei Zhan
-
Albert Tian Chen, et. al.Albert Tian Chen ... Shing Hei Zhan
22 Jan 2021
22 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets.

Abstract

Talk to us

Similar Papers

More From: Microbial Genomics