Robust and efficient identification of biomarkers from RNA-Seq data using median control chart

Md Shahjaman,Md Rezanur Rahman,Md Ibnul Asifuzzaman,Md Mamunur Rashid,Habiba Akter,Md Bipul Hossen

doi:10.12688/f1000research.17351.1

Md Shahjaman, Md Rezanur Rahman + Show 4 more

Open Access

https://doi.org/10.12688/f1000research.17351.1

Copy DOI

Journal: F1000Research	Publication Date: Jan 3, 2019
Citations: 1	License type: CC BY 4.0

Affiliation: Begum Rokeya University, Aga Khan University

Abstract

Background: One of the main goals of RNA-seq data analysis is identification of biomarkers that are differentially expressed (DE) across two or more experimental conditions. RNA-seq uses next generation sequencing technology and it has many advantages over microarrays. Numerous statistical methods have already been developed for identification the biomarkers from RNA-seq data. Most of these methods were based on either Poisson distribution or negative binomial distribution. However, efficient biomarker identification from discrete RNA-seq data is hampered by existing methods when the datasets contain outliers or extreme observations. Specially, the performance of these methods becomes more severe when the data come from a small number of samples in the presence of outliers. Therefore, in this study, an attempt is made to propose an outlier detection and modification approach for RNA-seq data to overcome the aforesaid problems of traditional methods. We make our proposed method facilitate in RNA-seq data by transforming the read count data into continuous data. Methods: We use median control chart to detect and modify the outlying observation in a log-transformed RNA-seq dataset. To investigate the performance of the proposed method in absence and presence of outliers, we employ the five popular biomarker selection methods (edgeR, edgeR_robust, DEseq, DEseq2 and limma) both in simulated and real datasets. Results: The simulation results strongly suggest that the performance of the proposed method improved in the presence of outliers. The proposed method also detected an additional 18 outlying DE genes from a real mouse RNA-seq dataset that were not detected by traditional methods. Using the KEGG pathway and gene ontology analysis results we reveal that these genes may be biomarkers, which require validation in a wet lab. Conclusions: Our proposal is to apply the proposed method for biomarker identification from other RNA-seq data.

Highlights

One of the major objectives of researchers is to identify biomarkers from RNA-Seq data that are differentially expressed (DE) between two or more experimental conditions
Performance evaluation In order to evaluate the performance of different biomarkers selection methods we considered the area under the receiver operating characteristic curve (ROC) curve
Biomarker identification under two or more conditions is an important task for elucidating the molecular basis of phenotypic variation

Summary

Introduction

One of the major objectives of researchers is to identify biomarkers from RNA-Seq data that are differentially expressed (DE) between two or more experimental conditions. Outliers may arise in RNA-seq count data because there are several data generating stages from biological harvesting of RNA samples to counting of sequence read map data[13] To mitigate this issue many algorithms use transformation methods. There are several transformation methods for RNA-seq data: logarithmic transformation[14], variance-stabilizing transformation (vst)[6], TMM transformation[15], regularized logarithm[8] and variance modeling at the observation level (voom)[16] These methods only reduce the low level outliers into reasonable spaces during parameter estimations; they fail to reduce the influence of high level outliers with small sample sizes in the data matrix. In this study, an attempt is made to propose an outlier detection and modification approach for RNA-seq data to improve the performance of the popular biomarker selection methods in the presence of outliers. In Results and Conclusions a broad simulation study and a real data study have been carried out

Methods

Results

Conclusions

21. Shahjaman Md

23. Shahjaman Md

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Robust and efficient identification of biomarkers from RNA-Seq data using median control chart

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research

Lead the way for us

Similar Papers

Traces of SARS-CoV-2 RNA in Peripheral Blood Cells of Patients with COVID-19.
Ahmed Moustafa ... Ramy K Aziz
OMICS: A Journal of Integrative Biology | VOL. 25
Ahmed Moustafa, et. al.Ahmed Moustafa ... Ramy K Aziz
19 Jul 2021
OMICS: A Journal of Integrative Biology | VOL. 25

Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.
Hung-I Harry Chen ... Devanand Sarkar
BMC Genomics | VOL. Suppl 16 7
Hung-I Harry Chen, et. al.Hung-I Harry Chen ... Devanand Sarkar
11 Jun 2015
BMC Genomics | VOL. Suppl 16 7

Robust identification of differentially expressed genes from RNA-seq data
Md Shahjaman ... S.M Shahinul Islam
Genomics | VOL. 112
Md Shahjaman, et. al.Md Shahjaman ... S.M Shahinul Islam
20 Nov 2019
Genomics | VOL. 112

Abstract 1817: Differential expression of long non-coding RNA in colon adenocarcinoma RNA-sequence data set
Stephen J O'Brien ... Shesh Rai
Cancer Research | VOL. 79
Stephen J O'Brien, et. al.Stephen J O'Brien ... Shesh Rai
01 Jul 2019
Abstract 1817: Differential expression of long non-coding RNA in colon adenocarcinoma RNA-sequence data set
Stephen J O'Brien ... Shesh Rai

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust and efficient identification of biomarkers from RNA-Seq data using median control chart

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: F1000Research