Bias correction and Bayesian analysis of aggregate counts in SAGE libraries

Russell L Zaretzki,William M Briggs,Artin Armagan,Michael A Gilchrist

doi:10.1186/1471-2105-11-72

Abstract

BackgroundTag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with its relative location. As a result, the observed tag counts represent a biased sample of the actual transcript pool. In SAGE this bias can be avoided by ignoring all but the 3' most tag but will discard a large fraction of the observed data. Taking this bias into account should allow more of the available data to be used leading to increased statistical power.ResultsThree new hierarchical models, which directly embed a model for the variation in tag formation probability, are proposed and their associated Bayesian inference algorithms are developed. These models may be applied to libraries at both the tag and aggregate level. Simulation experiments and analysis of real data are used to contrast the accuracy of the various methods. The consequences of tag formation bias are discussed in the context of testing differential expression. A description is given as to how these algorithms can be applied in that context.ConclusionsSeveral Bayesian inference algorithms that account for tag formation effects are compared with the DPB algorithm providing clear evidence of superior performance. The accuracy of inferences when using a particular non-informative prior is found to depend on the expression level of a given gene. The multivariate nature of the approach easily allows both univariate and joint tests of differential expression. Calculations demonstrate the potential for false positive and negative findings due to variation in tag formation probabilities across samples when testing for differential expression.

Highlights

Tag-based techniques, such as Serial Analysis of Gene Expression (SAGE), are commonly used to sample the mRNA pool of an organism’s transcriptome
Libraries based on Digital Gene Expression (DGE) are used to address the same questions and provide much larger numbers of tags leading to increased statistical power
The first objective of the current work is to provide a methodology that allows multiple tags, which arise due to incomplete digestion, to be combined and used to infer expression levels of the underlying mRNA transcripts

Summary

Introduction

Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism’s transcriptome. Tag-based transcriptome sequencing libraries consist of a collection of short sequences of DNA called tags along with tabulated counts of the number of times each tag is observed in a sample. These observed tag counts represent a sample from a much larger pool of mRNA tags in a tissue or organism. SAGE was used to assess differential expression across cells from different tissues or strains, or cells grown under different experimental conditions Generation methods such as Digital Gene Expression (DGE) tag profiling [1] provide a more efficient method to generate tag libraries and are growing in popularity. The close similarities between DGE and SAGE, the use of restriction enzymes, lead both techniques to share the same inherent biases

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Feb 3, 2010
Citations: 27	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Bias correction and Bayesian analysis of aggregate counts in SAGE libraries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent
Daniel P Gaile ... Jeffrey C Miecznikowski
BMC Genomics | VOL. 8
Daniel P Gaile, et. al.Daniel P Gaile ... Jeffrey C Miecznikowski
01 Jan 2007
BMC Genomics | VOL. 8

LTMG: a novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data.
Changlin Wan ... Melissa L Fishel
Nucleic acids research | VOL. 47
Changlin Wan, et. al.Changlin Wan ... Melissa L Fishel
02 Aug 2019
Nucleic acids research | VOL. 47

Improved statistical tests for differential gene expression by shrinking variance components estimates
X Cui ... J Qiu
Biostatistics | VOL. 6
X Cui, et. al.X Cui ... J Qiu
23 Dec 2004
Biostatistics | VOL. 6

Lithospheric scattering and intrinsic attenuation characterization from a Bayesian energy flux model 
Itahisa Gonzalez Alvarez ... Andy Nowacki
-
Itahisa Gonzalez Alvarez, et. al.Itahisa Gonzalez Alvarez ... Andy Nowacki
27 Mar 2022
27 Mar 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bias correction and Bayesian analysis of aggregate counts in SAGE libraries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics