Abstract

ObjectiveThe objective of this experiment was to identify transcripts in baker’s yeast (Saccharomyces cerevisiae) that could have originated from previously non-coding genomic regions, or de novo. We generated this data to be able to compare the transcriptomes of different species of Ascomycota.Data descriptionWe generated high-depth RNA sequencing data for 11 species of yeast: Saccharomyces cerevisiae, Saccharomyces paradoxus, Saccharomyces mikatae, Saccharomyces kudriavzevii, Saccharomyces bayanus, Naumovia castelii, Kluyveromyces lactis, Lachancea waltii, Lachancea thermotolerans, Lachancea kluyveri, and Schizosaccharomyces pombe. Using RNA-Seq from yeast grown in rich and oxidative conditions we created genome-guided de novo assemblies of the transcriptomes for each species. We included synthetic spike-in transcripts in each sample to determine the lower limit of detection of the sequencing platform as well as the reliability of our de novo transcriptome assembly pipeline. We subsequently compared the de novo transcripts assemblies to the reference gene annotations and generated assemblies that comprised both annotated and novel transcripts.

Highlights

  • We selected these 11 species based on their sparse taxonomic distribution, their amenability to growth in a custom rich media, the availability of genome assemblies, and their inclusion in previous studies of de novo genes in yeast

  • We created a non-redundant set of features from the reference annotations combined with our de novo assembled transcripts; de novo assembled transcripts which correspond to annotated features according to Cuffmerge version 2.2.0 [6] were discarded, while those that did not were considered to be novel; we identified an average of 700 novel transcripts per species [1] (Table 1)

  • Using the External RNA Control Consortium (ERCC) RNA Spike-in [4], we calculated that the lower limit of detection for annotated features in our pipeline was 2 transcripts per million (TPM), and the lower limit of expression necessary to reliably assemble novel transcripts was 15 TPM; over half of the unannotated transcripts that we assembled were expressed above this conservative threshold of 15 TPM in at least one of the two conditions

Read more

Summary

Introduction

*Correspondence: will.blevins@upf.edu 1 Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute and Universitat Pompeu Fabra, Barcelona, Spain Full list of author information is available at the end of the article of organisms for experiments ranging from experimental evolution to genetic engineering. We selected these 11 species based on their sparse taxonomic distribution, their amenability to growth in a custom rich media, the availability of genome assemblies, and their inclusion in previous studies of de novo genes in yeast.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call