Abstract

Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all difficult. Although RNA-sequencing (RNA-seq) is attracting the most attention, at present, the rate of new microarray studies submitted to public databases far exceeds the rate of new RNA-seq studies. There is clearly a need for methods that make it easier to combine data from different technologies. In this paper, we propose a new method for processing RNA-seq data that yields gene expression estimates that are much more similar to corresponding estimates from microarray data, hence greatly improving cross-platform comparability. The method we call PREBS is based on estimating the expression from RNA-seq reads overlapping the microarray probe regions, and processing these estimates with standard microarray summarisation algorithms. Using paired microarray and RNA-seq samples from TCGA LAML data set we show that PREBS expression estimates derived from RNA-seq are more similar to microarray-based expression estimates than those from other RNA-seq processing methods. In an experiment to retrieve paired microarray samples from a database using an RNA-seq query sample, gene signatures defined based on PREBS expression estimates were found to be much more accurate than those from other methods. PREBS also allows new ways of using RNA-seq data, such as expression estimation for microarray probe sets. An implementation of the proposed method is available in the Bioconductor package “prebs.”

Highlights

  • Public gene expression databases such as ArrayExpress [1] and Gene Expression Omnibus [2] host public data from more than half a million gene expression experiments

  • In this paper we present a method for processing RNA-seq data in a way to make the resulting expression measures significantly more comparable with measures derived from microarray data by estimating the expression level at the microarray probe regions using a method we call PREBS (Probe Region Expression estimation Based on Sequencing)

  • Gene expression is estimated from RNA-seq data by counting the number of reads that overlap with exons of the gene [17, 18]

Read more

Summary

Introduction

Public gene expression databases such as ArrayExpress [1] and Gene Expression Omnibus [2] host public data from more than half a million gene expression experiments. The existing microarray-based data represent a huge. Probe Region Expression Estimation for RNA-Seq Data PLOS ONE | DOI:10.1371/journal.pone.0126545 May 12, 2015

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call