Abstract

Over the last two decades, various in silico approaches have been developed and refined that attempt to identify protein and/or peptide vaccines candidates from informative signals encoded in protein sequences of a target pathogen. As to date, no signal has been identified that clearly indicates a protein will effectively contribute to a protective immune response in a host. The premise for this study is that proteins under positive selection from the immune system are more likely suitable vaccine candidates than proteins exposed to other selection pressures. Furthermore, our expectation is that protein sequence regions encoding major histocompatibility complexes (MHC) binding peptides will contain consecutive positive selection sites. Using freely available data and bioinformatic tools, we present a high-throughput approach through a pipeline that predicts positive selection sites, protein subcellular locations, and sequence locations of medium to high T-Cell MHC class I binding peptides. Positive selection sites are estimated from a sequence alignment by comparing rates of synonymous (dS) and non-synonymous (dN) substitutions among protein coding sequences of orthologous genes in a phylogeny. The main pipeline output is a list of protein vaccine candidates predicted to be naturally exposed to the immune system and containing sites under positive selection. Candidates are ranked with respect to the number of consecutive sites located on protein sequence regions encoding MHCI-binding peptides. Results are constrained by the reliability of prediction programs and quality of input data. Protein sequences from Toxoplasma gondii ME49 strain (TGME49) were used as a case study. Surface antigen (SAG), dense granules (GRA), microneme (MIC), and rhoptry (ROP) proteins are considered worthy T. gondii candidates. Given 8263 TGME49 protein sequences processed anonymously, the top 10 predicted candidates were all worthy candidates. In particular, the top ten included ROP5 and ROP18, which are T. gondii virulence determinants. The chance of randomly selecting a ROP protein was 0.2% given 8263 sequences. We conclude that the approach described is a valuable addition to other in silico approaches to identify vaccines candidates worthy of laboratory validation and could be adapted for other apicomplexan parasite species (with appropriate data).

Highlights

  • Since the inception of reverse vaccinology (Rappuoli, 2000) almost two decades ago, researchers have applied various in silico approaches to identify protein and/or peptide vaccines candidates worthy of laboratory validation

  • We present an approach that first predicts the T. gondii proteins that are naturally exposed to the immune system and contain sites under positive selection; and rank these candidates with respect to the number of consecutive sites located on epitopes

  • A total of 8322 protein sequences along with their originating mRNA sequences were downloaded from EupathDB for the test case target species, T. gondii ME49

Read more

Summary

Introduction

Since the inception of reverse vaccinology (Rappuoli, 2000) almost two decades ago, researchers have applied various in silico approaches to identify protein and/or peptide vaccines candidates worthy of laboratory validation. These approaches were previously reviewed (Bowman et al, 2011; Jones, 2012; Donati and Rappuoli, 2013; Rappuoli et al, 2016). There is no proven set of characteristics, the general community consensus delineating a valued characteristic is one inferring that a protein is either external to or located on, or in, the membrane of a pathogen. We investigate the amino acid substitution rate of a protein as an additional characteristic supporting its vaccine candidacy

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call