Abstract

Many current RNA-sequencing data analysis methods compare expressions one gene at a time, taking little consideration of the correlations among genes. In this study, we propose a method to convert such an one-dimensional comparison approach into a two-dimensional evaluation of the ratio of standard deviations (SD) of two constructed random variables. This method allows the identification of differentially expressed genes while controlling a preset significance level conditional on the read count mean-variance relationship. Meanwhile, correlations among genes are naturally accommodated due to the clustering of genes with similar distribution in the proposed σ-σ plot. The proposed distribution-free method is designated as DFseq, because it does not depend on a parametric distribution to fit read count. As a result, compared with parametric methods, DFseq can effectively handle genes with a bimodal-like distribution and/or genes with excessive 0 read counts, as well as genes with outlying observations. Besides, DFseq is an ideal platform for comparing performance of different differential gene expression detection methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call