Abstract

The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control) to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments). Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA), Biosample, Bioprojects, and Gene Expression Omnibus (GEO). Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe “MetaRNA-Seq,” a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.

Highlights

  • High-throughput gene expression studies are pivotal in functional biology and genomics research [1]

  • Until late in the last decade, high-throughput gene expression studies were carried out using microarray technology [2], which has provided great insight into several gene expression studies and led to the establishment of public repositories, such as the Gene Expression Omnibus (GEO) hosted by the NCBI [3, 4]

  • Because the type and amount of data are similar to other next-generation sequencing (NGS) data, RNA-Seq studies are deposited in the Sequence Read Archive (SRA) [10]

Read more

Summary

Introduction

High-throughput gene expression studies are pivotal in functional biology and genomics research [1]. Until late in the last decade, high-throughput gene expression studies were carried out using microarray technology [2], which has provided great insight into several gene expression studies and led to the establishment of public repositories, such as the Gene Expression Omnibus (GEO) hosted by the NCBI [3, 4]. With the advent of next-generation sequencing (NGS) technology and its meteoric growth [5, 6], RNA-Seq is slowly becoming prevalent in high-throughput gene expression studies [7]. RNA-Seq studies were deposited in the GEO database. The metadata from projects in RNA-Seq studies are hosted by the NCBI Bioproject database. The metadata for samples used in RNA-Seq studies are deposited in the NCBI Biosample database.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call