Abstract

The Cancer Genome Atlas database offers the possibility of analyzing genome-wide expression RNA-Seq cancer data using paired counts, that is, studies where expression data are collected in pairs of normal and cancer cells, by taking samples from the same individual. Correlation of gene expression profiles is the most common analysis to study co-expression groups, which is used to find biological interpretation of -omics big data. The aim of the paper is threefold: firstly we show for the first time, the presence of a “regulation-correlation bias” in RNA-Seq paired expression data, that is an artifactual link between the expression status (up- or down-regulation) of a gene pair and the sign of the corresponding correlation coefficient. Secondly, we provide a statistical model able to theoretically explain the reasons for the presence of such a bias. Thirdly, we present a bias-removal algorithm, called SEaCorAl, able to effectively reduce bias effects and improve the biological significance of correlation analysis. Validation of the SEaCorAl algorithm is performed by showing a significant increase in the ability to detect biologically meaningful associations of positive correlations and a significant increase of the modularity of the resulting unbiased correlation network.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call