Link Your Sites (LYS) Scripts: Automated Search of Protein Structures and Mapping of Sites Under Positive Selection Detected by PAML

Lys Sanz Moreta,Rute R Da Fonseca

doi:10.1007/s11692-020-09507-9

Lys Sanz Moreta, Rute R Da Fonseca

Open Access

PDF Available

https://doi.org/10.1007/s11692-020-09507-9

Copy DOI

Export

Save

Cite

Journal: Evolutionary Biology	Publication Date: Jun 30, 2020
License type: cc-by

Affiliation: University of Copenhagen

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The visualization of the molecular context of an amino acid mutation in a protein structure is crucial for the assessment of its functional impact and the understanding of its evolutionary implications. Currently, searches for fast evolving amino acid positions using codon substitution models like those implemented in PAML (Yang and Nielsen in Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 17(1):32–43, 2000; Zhang et al. in Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol 22(12):2472–2479, 2005) are done in almost complete proteomes, generating large numbers of candidate proteins making the analysis of individual protein structures and models very time-consuming. Here we present the package Link Your Sites (LYS) that can be used to reduce the number of analysed targets to those for which structural information can be retrieved. LYS consists of two python wrapper scripts, where the first one (i) mines the RCSB database (Berman et al. in The protein data bank. Nucleic Acids Res 28(1):235–242, 2000) using the BLAST alignment tool to find the best matching homologous sequences, (ii) fetches their domain positions by using Prosites (Hamelryck and Manderick in Pdb file parser and structure class implemented in python. Bioinformatics 19(17):2308–2310, 2003; Sigrist et al. in Prosite: a documented database using patterns and profiles as motif descriptors. Brief Bioinf 3(3):265–274, 2002; Sigrist et al. in New and continuing developments at prosite. Nucleic Acids Res 41(D1):D344–D347, 2012), (iii) parses the output of PAML extracting the positional information of fast-evolving sites and transforms them into the coordinate system of the protein structure, (iv) outputs one file per gene with the equivalence among the positions in the input sequence and homologous structure. The second script produces figures to be used in publications highlighting the positively selected sites mapped on regions that are known to have functional relevance.

Highlights

One of the goals in comparative genomics studies is to find regions of the genomes that evolve at elevated rates, which can potentially indicate that they involved in promoting adaptation to new environments
Positive selection is evaluated through the ω value that corresponds to the ratio between the amount of non-synonymous mutations per non-synonymous site and the amount of synonymous mutations per synonymous site
A first step in the evaluation of the impact of these mutations consists on identifying their location on a protein structure and verify whether they are located within known functional domains

Summary

Introduction

One of the goals in comparative genomics studies is to find regions of the genomes that evolve at elevated rates, which can potentially indicate that they involved in promoting adaptation to new environments. Such regions are said to be evolving under positive selection [10]. Non-synonymous mutations can be relevant if the amino acid switch introduced generates a change in the physicochemical properties of the residue and affects the protein function. If the residues in the functional domain are exchanged with an amino acid with different properties, these interactions will be modified together with the structure and its binding attributes will be affected [2]. Mutations in the functional domain are more likely to affect the protein’s function when compared to those located in other parts of the structure

Methods

Results

Conclusion