NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.

Kai Dong,Hongyu Zhao,Xiang Wan,Tiejun Tong

doi:10.1186/s12859-016-1208-1

Kai Dong, Hongyu Zhao + Show 2 more

Open Access

https://doi.org/10.1186/s12859-016-1208-1

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Sep 13, 2016
Citations: 66	License type: CC BY 4.0

Affiliation: Hong Kong Baptist University, Yale University

Abstract

BackgroundRNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5:2493–2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated.ResultsIn this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes’ rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications.ConclusionsWe have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data. R code is available at http://www.comp.hkbu.edu.hk/~xwan/NBLDA.Ror https://github.com/yangchadam/NBLDA

Highlights

RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays
Simulation design We generate the data from the following negative binomial distribution: Xig |yi = k ∼ NB(siλg dkg, φ)
We have further explored the relationship between negative binomial linear discriminant analysis (NBLDA) and Poisson linear discriminant analysis (PLDA), and investigated the impact of dispersion on the discriminant score of NBLDA by conducting a numerical comparison

Summary

Introduction

RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. The Poisson distribution and negative binomial distribution are commonly used to model count data. Witten (Annals Appl Stat 5:2493–2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. RNA-sequencing (RNA-Seq) is a revolutionary technology that uses the capabilities of next-generation sequencing to infer gene expression levels [1,2,3]. RNA-Seq has many advantages including the detection of novel transcripts, low background signal, and the increased specificity and sensitivity. Due to reduced sequencing cost, RNA-Seq has been widely used in biomedical research in recent years [4]. RNA-seq usually produces millions of short reads, between 25 and 300 base-pairs in length. The reads are mapped to genomic or transcriptomic regions of interest

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data.
Jiadi Zhu ... Yan Zhou
Frontiers in genetics | VOL. 12
Jiadi Zhu, et. al.Jiadi Zhu ... Yan Zhou
04 Mar 2021
Frontiers in genetics | VOL. 12

Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.
Hung-I Harry Chen ... Devanand Sarkar
BMC Genomics | VOL. Suppl 16 7
Hung-I Harry Chen, et. al.Hung-I Harry Chen ... Devanand Sarkar
11 Jun 2015
BMC Genomics | VOL. Suppl 16 7

Classifying next-generation sequencing data using a zero-inflated Poisson model.
Yan Zhou ... Baoxue Zhang
Bioinformatics | VOL. 34
Yan Zhou, et. al.Yan Zhou ... Baoxue Zhang
27 Nov 2017
Bioinformatics | VOL. 34

Robust and efficient identification of biomarkers from RNA-Seq data using median control chart
Md Shahjaman ... Md Bipul Hossen
F1000Research | VOL. 8
Md Shahjaman, et. al.Md Shahjaman ... Md Bipul Hossen
03 Jan 2019
F1000Research | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics