Abstract

BackgroundRNA sequencing (RNA-seq) is an indispensable tool in the study of gene regulation. While the technology has brought with it better transcript coverage and quantification, there remain considerable barriers to entry for the computational biologist to analyse large data sets. There is a real need for a repository of uniformly processed RNA-seq data that is easy to use.FindingsTo address these obstacles, we developed Digital Expression Explorer 2 (DEE2), a web-based repository of RNA-seq data in the form of gene-level and transcript-level expression counts. DEE2 contains >5.3 trillion assigned reads from 580,000 RNA-seq data sets including species Escherichia coli, yeast, Arabidopsis, worm, fruit fly, zebrafish, rat, mouse, and human. Base-space sequence data downloaded from the National Center for Biotechnology Information Sequence Read Archive underwent quality control prior to transcriptome and genome mapping using open-source tools. Uniform data processing methods ensure consistency across experiments, facilitating fast and reproducible meta-analyses.ConclusionsThe web interface allows users to quickly identify data sets of interest using accession number and keyword searches. The data can also be accessed programmatically using a specifically designed R package. We demonstrate that DEE2 data are compatible with statistical packages such as edgeR or DESeq. Bulk data are also available for download. DEE2 can be found at http://dee2.io.

Highlights

  • RNA sequencing (RNA-seq) is an indispensable tool in the study of gene regulation

  • In an effort to boost reuse of public transcriptome data, we developed Digital Expression Explorer 2 (DEE2), an open-access web-based repository of uniformly processed RNA sequencing (RNAseq) digital gene-level and transcript-level expression data for several major organisms that is compatible with many types of downstream analyses

  • We can conclude that highly correlated data sets are standard RNA-seq/mRNA-seq and data sets with low correlation are mostly due to the use of nonstandard library construction protocols, and some data sets derived from less characterized biological states

Read more

Summary

Background

Since its first description 10 years ago, RNA sequencing (RNAseq) has become a powerful method in transcriptomics, allowing highly accurate gene expression quantification [1]. As the cost of sequencing decreases, RNA-seq data are becoming more ubiquitous in the scientific literature. It is standard practice in the field and a compulsory requirement for journals to deposit these data to Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA) [2,3] in the form of raw and processed files, with the aim of fostering greater reuse and transparency.

Digital expression explorer 2
Experiments Runs
A Brief Meta-analysis of Yeast Gene Expression
Conclusion
Findings
Methods
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.