Abstract

The diamondback moth, Plutella xylostella (L.), is the major cosmopolitan pest of brassica and other cruciferous crops. Its larval midgut is a dynamic tissue that interfaces with a wide variety of toxicological and physiological processes. The draft sequence of the P. xylostella genome was recently released, but its annotation remains challenging because of the low sequence coverage of this branch of life and the poor description of exon/intron splicing rules for these insects. Peptide sequencing by computational assignment of tandem mass spectra to genome sequence information provides an experimental independent approach for confirming or refuting protein predictions, a concept that has been termed proteogenomics. In this study, we carried out an in-depth proteogenomic analysis to complement genome annotation of P. xylostella larval midgut based on shotgun HPLC-ESI-MS/MS data by means of a multialgorithm pipeline. A total of 876,341 tandem mass spectra were searched against the predicted P. xylostella protein sequences and a whole-genome six-frame translation database. Based on a data set comprising 2694 novel genome search specific peptides, we discovered 439 novel protein-coding genes and corrected 128 existing gene models. To get the most accurate data to seed further insect genome annotation, more than half of the novel protein-coding genes, i.e. 235 over 439, were further validated after RT-PCR amplification and sequencing of the corresponding transcripts. Furthermore, we validated 53 novel alternative splicings. Finally, a total of 6764 proteins were identified, resulting in one of the most comprehensive proteogenomic study of a nonmodel animal. As the first tissue-specific proteogenomics analysis of P. xylostella, this study provides the fundamental basis for high-throughput proteomics and functional genomics approaches aimed at deciphering the molecular mechanisms of resistance and controlling this pest.

Highlights

  • From the ‡Department of Plant Protection, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China; §BGI-Shenzhen, Shenzhen, 518083 China; ¶CEAMarcoule, DSV/IBITEC-S/SPI/Li2D, Laboratory, BP 17171, F-30200, Bagnols-sur-Ceze, F-30207, France

  • Overview of MS/MS Data for Multialgorithmic Proteogenomic Analysis—The goal of this study was to carry out in-depth proteomic profiling to map the proteome of larval midgut of P. xylostella at a global scale using a proteogenomic approach

  • For the diamondback moth (DBM) genome database (DBMDB), many redundant genomic regions were characterized as “n” to further assemble the DBM genome with less scaffolds

Read more

Summary

Introduction

Conventional approaches used to identify protein-coding genes are highly dependent on predictions based on computational algorithms and homology searches against known proteins. Nonmodel organisms, such as most arthropods, are by nature distantly related to well-studied organisms [1]. The use of information from deep sequencing of mRNA-derived cDNA libraries (RNA-seq) or expressed sequence tag (EST) libraries can dramatically improve the genome annotation confidence [2,3,4] This analysis remains at the transcript level and cannot make a crystal-clear distinction between coding and noncoding sequences in many cases. Other sources of evidence can be incorporated in the proteogenomics procedure

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call