Abstract
Correlation is ubiquitously used in gene expression analysis although its validity as an objective criterion is often questionable. If no normalization reflecting the original mRNA counts in the cells is available, correlation between genes becomes spurious. Yet the need for normalization can be bypassed using a relative analysis approach called log-ratio analysis. This approach can be used to identify proportional gene pairs, i.e. a subset of pairs whose correlation can be inferred correctly from unnormalized data due to their vanishing log-ratio variance. To interpret the size of non-zero log-ratio variances, a proposal for a scaling with respect to the variance of one member of the gene pair was recently made by Lovell et al. Here we derive analytically how spurious proportionality is introduced when using a scaling. We base our analysis on a symmetric proportionality coefficient (briefly mentioned in Lovell et al.) that has a number of advantages over their statistic. We show in detail how the choice of reference needed for the scaling determines which gene pairs are identified as proportional. We demonstrate that using an unchanged gene as a reference has huge advantages in terms of sensitivity. We also explore the link between proportionality and partial correlation and derive expressions for a partial proportionality coefficient. A brief data-analysis part puts the discussed concepts into practice.
Highlights
The frequently compositional nature of biological data and its methodological implications (a.k.a. analysis of ‘‘closed’’ data) have not been widely acknowledged yet Lovell et al (2011)
We base our analysis on a symmetric proportionality coefficient that has a number of advantages over their statistic
We show in detail how the choice of reference needed for the scaling determines which gene pairs are identified as proportional
Summary
The frequently compositional nature of biological data and its methodological implications (a.k.a. analysis of ‘‘closed’’ data) have not been widely acknowledged yet Lovell et al (2011). While correlations between the columns of our compositional matrix cannot be defined coherently, the covariance structure of a compositional data matrix can be summarized considering, for all pairs i, j (i\j), the (sample) variances of their log ratios logxxij Aitchison (2003) These will be close to zero if genes i and j maintain an approximately proportional relationship xi ’ mxj across observations for some real value m. In this contribution, we will interpret log-ratio transformations as an attempt to back transform relative data into absolute data. In good agreement with our analytical results, the approach taken by Lovell et al leads to a much lower overlap of prediction between absolute and relative data compared with the application of an approximately unchanged reference
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.