A 'phylogeny-aware' multi-objective optimization approach for computing MSA

Muhammad Ali Nayeem,Md Shamsuzzoha Bayzid,Atif Hasan Rahman,Rifat Shahriyar,M Sohel Rahman

doi:10.1145/3321707.3321773

Abstract

Multiple sequence alignment (MSA) is a basic step in many analyses in bioinformatics, including predicting the structure and function of proteins, orthology prediction and estimating phylogenies. The objective of MSA is to infer the homology among the sequences of chosen species. Commonly, the MSAs are inferred by optimizing a single objective function. The alignments estimated under one criterion may be different to the alignments generated by other criteria, inferring discordant homologies and thus leading to different evolutionary histories relating the sequences. In the recent past, researchers have advocated for the multi-objective formulation of MSA, to address this issue, where multiple conflicting objective functions are being optimized simultaneously to generate a set of alignments. However, no theoretical or empirical justification with respect to a real-life application has been shown for a particular multi-objective formulation. In this study, we investigate the impact of multi-objective formulation in the context of phylogenetic tree estimation. In essence, we ask the question whether a phylogeny-aware metric can guide us in choosing appropriate multi-objective formulations. Employing evolutionary optimization, we demonstrate that trees estimated on the alignments generated by multi-objective formulation are substantially better than the trees estimated by the state-of-the-art MSA tools, including PASTA, T-Coffee, MAFFT etc.

Full Text