Empirical Bayesian analysis of paired high-throughput sequencing data with a beta-binomial distribution.

Thomas J Hardcastle,Krystyna A Kelly

doi:10.1186/1471-2105-14-135

Thomas J Hardcastle, Krystyna A Kelly

Open Access

PDF Available

https://doi.org/10.1186/1471-2105-14-135

Copy DOI

Export

Save

Cite

Journal: BMC Bioinformatics	Publication Date: Apr 23, 2013
Citations: 30	License type: CC BY 2.0

Affiliation: University of Cambridge

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundPairing of samples arises naturally in many genomic experiments; for example, gene expression in tumour and normal tissue from the same patients. Methods for analysing high-throughput sequencing data from such experiments are required to identify differential expression, both within paired samples and between pairs under different experimental conditions.ResultsWe develop an empirical Bayesian method based on the beta-binomial distribution to model paired data from high-throughput sequencing experiments. We examine the performance of this method on simulated and real data in a variety of scenarios. Our methods are implemented as part of the RbaySeq package (versions 1.11.6 and greater) available from Bioconductor (http://www.bioconductor.org).ConclusionsWe compare our approach to alternatives based on generalised linear modelling approaches and show that our method offers significant gains in performance on simulated data. In testing on real data from oral squamous cell carcinoma patients, we discover greater enrichment of previously identified head and neck squamous cell carcinoma associated gene sets than has previously been achieved through a generalised linear modelling approach, suggesting that similar gains in performance may be found in real data. Our methods thus show real and substantial improvements in analyses of high-throughput sequencing data from paired samples.

Highlights

Pairing of samples arises naturally in many genomic experiments; for example, gene expression in tumour and normal tissue from the same patients
We present here an empirical Bayesian method based on an over-dispersed binomial distribution, the betabinomial, for addressing the problem of detecting both types of differential expression in paired sequencing data
The data from high-throughput sequencing experiments used in differential expression analysis may be thought of as a set of tags, defining the unique reads sequenced in the experiment, and a set of counts, giving the number of times each tag is observed in each of the sequenced libraries made from the samples

Summary

Introduction

Pairing of samples arises naturally in many genomic experiments; for example, gene expression in tumour and normal tissue from the same patients. The data are generally modelled using an over-dispersed Poisson distribution (generally the negative-binomial distribution [5,6,7]), the beta-binomial distribution [8] has been used. These methods offer relatively robust and sensitive detection of differential expression either through pairwise comparisons [6,7] or a model-based approach [5]

Methods

Results

Conclusion