Abstract

When changes at few amino acid sites are the target of selection, adaptive amino acid changes in protein sequences can be identified using maximum-likelihood methods based on models of codon substitution (such as codeml). Although such methods have been employed numerous times using a variety of different organisms, the time needed to collect the data and prepare the input files means that tens or hundreds of coding regions are usually analyzed. Nevertheless, the recent availability of flexible and easy to use computer applications that collect relevant data (such as BDBM) and infer positively selected amino acid sites (such as ADOPS), means that the entire process is easier and quicker than before. However, the lack of a batch option in ADOPS, here reported, still precludes the analysis of hundreds or thousands of sequence files. Given the interest and possibility of running such large-scale projects, we have also developed a database where ADOPS projects can be stored. Therefore, this study also presents the B+ database, which is both a data repository and a convenient interface that looks at the information contained in ADOPS projects without the need to download and unzip the corresponding ADOPS project file. The ADOPS projects available at B+ can also be downloaded, unzipped, and opened using the ADOPS graphical interface. The availability of such a database ensures results repeatability, promotes data reuse with significant savings on the time needed for preparing datasets, and effortlessly allows further exploration of the data contained in ADOPS projects.

Highlights

  • Amino acid changes in protein sequences can be adaptive, and when changes at few amino acid sites are the target of selection they can be detected using maximum-likelihood methods based on the models of codon substitution [1,2,3]

  • This approach has been applied numerous times to infer positively selected amino acid sites at numerous proteins including, but not limited to: interleukin-3 (IL3), a protein associated with brain volume variation in general human populations [4]; formyl peptide receptors in mammals [5]; scorpion sodium channel toxins [6]; the Mimulus plant CENH3 protein [7]; the oyster Crassostrea gigas peptidoglycan recognition proteins [8]; host immune response genes [9, 10]; the envelope glycoprotein of dengue viruses [11]; the attachment glycoprotein of respiratory syncytial virus [12]; measles virus hemagglutinin [13]; influenza B virus hemagglutinin [14]; HIV proteins [15]; hemagglutinin-neuraminidase protein of Newcastle disease virus [16]; Trypanosoma brucei proteins [17]; the vertebrate skeletal muscle sodium

  • We present B+, a database that has been designed to store and show the information contained in Automatic Detection of Positively Selected Sites (ADOPS) project files

Read more

Summary

Introduction

Amino acid changes in protein sequences can be adaptive, and when changes at few amino acid sites are the target of selection they can be detected using maximum-likelihood methods based on the models of codon substitution [1,2,3]. Maximum-likelihood methods based on models of codon substitution have been widely used to infer positively selected amino acid sites, the size of the average project is still relatively small mainly due to the time needed to collect the relevant coding sequences and prepare input files for the different software applications. A database dedicated to positive selection inferences at the codon level has already been published [28], it is dedicated to a specific group of organisms, and the possibility of reusing data is not as easy as with B+ and ADOPS Both large and small ADOPS datasets can be submitted to B+ (as compressed tar.gz files) along with a description containing the details about how the project was performed. The B+ database hosts the “Closely related Drosophila dataset (2016)”, which provides ADOPS projects for 19,652 Drosophila transcripts, 14.6% of which show signs of positive selection (1200 genes), curated analyses must be performed to validate these results

ADOPS Batch Mode
Implementation
Database Exploration
Management Interface
Usefulness
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call