Abstract

BackgroundGene co-expression analysis has previously been based on measures that include correlation coefficients and mutual information, as well as newcomers such as MIC. These measures depend primarily on the degree of association between the RNA levels of two genes and to a lesser extent on their variability. They focus on the similarity of expression value trajectories that change in like manner across samples. However there are relationships of biological interest for which these classical measures are expected to be insensitive. These include genes whose expression levels are ratiometrically stable and genes whose variance is tightly constrained. Large-scale studies of relatively homogeneous samples, including single cell RNA-seq, are experimental settings in which such relationships might be especially pertinent.ResultsWe develop and implement a ratiometric approach for detecting gene associations (abbreviated RA). It is based on the coefficient of variation of the measured expression ratio of each pair of genes. We apply it to a collection of lymphoblastoid RNA-seq data from the 1000 Genomes Project Consortium, a typical sample set with high overall homogeneity. RA is a selective method, reporting in this case ~1/4 of all possible gene pairs, yet these relationships include a distilled picture of biological relationships previously found by other methods. In addition, RA reveals expression relationships that are not detected by traditional correlation and mutual information methods. We also analyze data from individual lymphoblastoid cells and show that desirable properties of the RA method extend to single-cell RNA-seq.ConclusionWe show that our ratiometric method identifies biologically significant relationships that are often missed or low-ranked by conventional association-based methods when applied to a relatively homogenous dataset. The results open new questions about the regulatory mechanisms that produce strong RA relationships. RA is scalable and potentially well suited for the analysis of thousands of bulk-RNA or single-cell transcriptomes.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2105-15-331) contains supplementary material, which is available to authorized users.

Highlights

  • Gene co-expression analysis has previously been based on measures that include correlation coefficients and mutual information, as well as newcomers such as maximum information coefficient (MIC)

  • This relative dispersion is measured by the coefficient of variation (CV), which is the standard deviation of the ratio divided by the mean of the ratio

  • We use ΔCV to explicitly model the variability in expression values for a given gene pair (A,B), which may affect traditional measures of co-expression differently than ratiometric method (RA)

Read more

Summary

Introduction

Gene co-expression analysis has previously been based on measures that include correlation coefficients and mutual information, as well as newcomers such as MIC These measures depend primarily on the degree of association between the RNA levels of two genes and to a lesser extent on their variability. Analyses of gene co-expression that use measures of association, such as the Pearson and Spearman correlation coefficients, the squared Pearson correlation coefficient (R2), and mutual information, are ubiquitous in modern biology These measures of association are the basis for the most widely used clustering techniques [1], and are used for a diversity of network motif and inference algorithms ([2,3,4] and references therein). Is there an effective way to detect biological relationships from these expression data that will not depend heavily on sample heterogeneity? Second, are there classes of relationships of potential biological importance that have been persistently missed or undervalued by the existing methods?

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.