Abstract

BackgroundDe novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs.ResultsWe present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcript-derived contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions.ConclusionsDe novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, http://eucspresso.bi.up.ac.za/) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants.

Highlights

  • De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms

  • In the process of producing such a high-quality expressed gene catalog for Eucalyptus, we addressed three main questions: First, is it feasible to de novo assemble Illumina mRNA-Seq data into contiguous, near full-length gene model sequences for Eucalyptus? Second, what genes make up the expressed gene catalog for a fast-growing Eucalyptus plantation tree? can we re-use the mRNA-Seq data to create a tissue and organ-specific digital expression profile for each assembled contig? We addressed these questions by generating a comprehensive set of expressed gene sequences from a commercially grown Eucalyptus hybrid (E. grandis × E. urophylla) clone using Illumina mRNA-Seq technology and de novo short-read assembly

  • We report the complete annotation of the expressed gene catalog based on comparative analysis with the published Arabidopsis thaliana [25], Populus trichocarpa [26] and Vitis vinifera [27] protein-coding datasets

Read more

Summary

Introduction

De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs. 454 pyrosequencing has proven very useful for generating ESTs representing the majority of expressed genes This has enabled gene discovery in a variety of previously uncharacterized eukaryotic organisms with no or little a priori DNA sequence information [12,13,14,15,16]. Relatively few published studies have attempted de novo assembly of whole-transcriptome sequences from short-read data such as that generated by Illumina GA or SOLiD technologies. Of short (36-72 bp) read data into accurate, contiguous transcript sequences has only recently been reported [17,18,19] demonstrating that assembly of long, potentially full-length, transcript assemblies is possible

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call