Abstract

BackgroundAs more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. However, many newly sequenced genomes have limited resources for gene predictions. In an effort to create high-quality gene models of the cucumber genome (Cucumis sativus var. sativus), based on the EVidenceModeler gene prediction pipeline, we incorporated the massively parallel complementary DNA sequencing (RNA-Seq) reads of 10 cucumber tissues into EVidenceModeler. We applied the new pipeline to the reassembled cucumber genome and included a comparison between our predicted protein-coding gene sets and a published set.ResultsThe reassembled cucumber genome, annotated with RNA-Seq reads from 10 tissues, has 23, 248 identified protein-coding genes. Compared with the published prediction in 2009, approximately 8, 700 genes reveal structural modifications and 5, 285 genes only appear in the reassembled cucumber genome. All the related results, including genome sequence and annotations, are available at http://cmb.bnu.edu.cn/Cucumis_sativus_v20/.ConclusionsWe conclude that RNA-Seq greatly improves the accuracy of prediction of protein-coding genes in the reassembled cucumber genome. The comparison between the two gene sets also suggests that it is feasible to use RNA-Seq reads to annotate newly sequenced or less-studied genomes.

Highlights

  • As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology

  • Genome reassembly Using the improved SOAPdenovo program [18] (Release 1.04), we reassembled the cucumber genome by integrating additional large insert paired-end Illumina GA reads from Cucumis sativus var. sativus (7.4-fold genome coverage, 5 Kb insert size) and from Cucumis. sativus var. hardwickii (3.8-fold, 5 Kb insert size; 3.2-fold, 10 Kb insert size; see Additional file 1, Table S1 for details)

  • Reconstructing transcripts from RNA-Seq by de novo assembly and ‘align--assemble’ approaches We obtained about 220 million Solexa/Illumina RNA-Seq reads from poly(A) RNAs extracted from 10 cucumber tissues (Table 1)

Read more

Summary

Introduction

As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. Sativus), based on the EVidenceModeler gene prediction pipeline, we incorporated the massively parallel complementary DNA sequencing (RNA-Seq) reads of 10 cucumber tissues into EVidenceModeler. Within the process of genome annotation, is a complex endeavor In eukaryotic species, it is usually carried out by integrating multiple sources of evidence [4], such as complementary DNA (cDNA), proteins in closely related species, and de novo predictions [5]. Until recently, the sequencing of cDNA was a laborious and capital-intensive task

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.