Abstract
Abstract Background: Detection of correlated gene expression is a fundamental process in the characterization of gene functions using microarray data. Commonly used methods such as the Pearson correlation can detect only a fraction of interactions between genes or their products. However, the performance of correlation analysis can be significantly improved either by providing additional biological information or by combining correlation with other techniques that can extract various mathematical or statistical properties of gene expression from microarray data. In this article, I will test the performance of three correlation methods-the Pearson correlation, the rank (Spearman) correlation, and the Mutual Information approach-in detection of protein-protein interactions, and I will further examine the properties of these techniques when they are used together. I will also develop a new correlation measure which can be used with other measures to improve predictive power. Results:Using data from 5,896 microarray hybridizations, the three measures were obtained for 30,499 known protein-interacting pairs in the Human Protein Reference Database (HPRD). Pearson correlation showed the best sensitivity (0.305) but the three measures showed similar specificity (0.240 - 0.257). When the three measures were compared, it was found that better specificity could be obtained at a high Pearson coefficient combined with a low Spearman coefficient or Mutual Information. Using a toy model of two gene interactions, I found that such measure combinations were most likely to exist at stronger curvature. I therefore introduced a new measure, termed asymmetric correlation (AC), which directly quantifies the degree of curvature in the expression levels of two genes as a degree of asymmetry. I found that AC performed better than the other measures, particularly when high specificity was required. Moreover, a combination of AC with other measures significantly improved specificity and sensitivity, by up to 50%. Conclusions: A combination of correlation measures, particularly AC and Pearson correlation, can improve prediction of protein-protein interactions. Further studies are required to assess the biological significance of asymmetry in expression patterns of gene pairs.
Highlights
Detection of correlated gene expression is a fundamental process in the characterization of gene functions using microarray data
Human protein interaction data were from the Human Protein Reference Database (HPRD) [20], which has information on 30,499 interactions
Mutual Information (MI) was highly sensitive to the value of K. These results show that the elevated Pearson correlation (PC), and the low SC and MI, on analysis of the protein-interacting group, are more likely to be evident at high levels of non-linearity and noise
Summary
It is common to examine correlations among gene expression levels [1]. It has been shown that direct inference methods such as PC are suitable for detecting stable protein complexes whereas conditional methods, including partial PC or the Graphical Gaussian model, are better at defining causal interactions [17] This means that different methods can be mutually complementary, when used to expand detection of protein interactions. I examine the performance of three commonly used measures - PC, SC, and MI - in predicting known human protein-protein interactions from microarray data, and assess the possibility of achieving improved performance through data combination Based on this analysis, I introduce a new measure, termed asymmetric correlation (AC), and show that AC improves the performance of other measures
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.