Abstract

Accurate annotation of protein-coding genes is one of the primary tasks upon the completion of whole genome sequencing of any organism. In this study, we used an integrated transcriptomic and proteomic strategy to validate and improve the existing zebrafish genome annotation. We undertook high-resolution mass-spectrometry-based proteomic profiling of 10 adult organs, whole adult fish body, and two developmental stages of zebrafish (SAT line), in addition to transcriptomic profiling of six organs. More than 7,000 proteins were identified from proteomic analyses, and ∼ 69,000 high-confidence transcripts were assembled from the RNA sequencing data. Approximately 15% of the transcripts mapped to intergenic regions, the majority of which are likely long non-coding RNAs. These high-quality transcriptomic and proteomic data were used to manually reannotate the zebrafish genome. We report the identification of 157 novel protein-coding genes. In addition, our data led to modification of existing gene structures including novel exons, changes in exon coordinates, changes in frame of translation, translation in annotated UTRs, and joining of genes. Finally, we discovered four instances of genome assembly errors that were supported by both proteomic and transcriptomic data. Our study shows how an integrative analysis of the transcriptome and the proteome can extend our understanding of even well-annotated genomes.

Highlights

  • More than 7,000 proteins were identified from proteomic analyses, and ϳ69,000 high-confidence transcripts were assembled from the RNA sequencing data

  • Our previous efforts have successfully demonstrated the power of proteogenomic analyses in improving genome annotation, as exemplified by studies on Mycobacterium tuberculosis, Candida glabrata, Leishmania donovani, Anopheles gambiae, and Homo sapiens (7–11)

  • 90 million reads were obtained per organ. We chose these tissues in order to complement transcriptomic data from ovary, whole body, and developmental stages that have already been integrated into genome annotations from Ensembl (2)

Read more

Summary

Introduction

Improve the existing zebrafish genome annotation. We undertook high-resolution mass-spectrometry-based proteomic profiling of 10 adult organs, whole adult fish body, and two developmental stages of zebrafish (SAT line), in addition to transcriptomic profiling of six organs. The alternative gene and transcript models generated from transcript data alone could potentially be used to propose many changes in the genome annotation, in the manual annotation process adopted by our group, only those RNA-Seq-based revisions that were supported by peptide evidence were considered.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call