Abstract

Correlation networks are frequently used to statistically extract biological interactions between omics markers. Network edge selection is typically based on the statistical significance of the correlation coefficients. This procedure, however, is not guaranteed to capture biological mechanisms. We here propose an alternative approach for network reconstruction: a cutoff selection algorithm that maximizes the overlap of the inferred network with available prior knowledge. We first evaluate the approach on IgG glycomics data, for which the biochemical pathway is known and well-characterized. Importantly, even in the case of incomplete or incorrect prior knowledge, the optimal network is close to the true optimum. We then demonstrate the generalizability of the approach with applications to untargeted metabolomics and transcriptomics data. For the transcriptomics case, we demonstrate that the optimized network is superior to statistical networks in systematically retrieving interactions that were not included in the biological reference used for optimization.

Highlights

  • Correlation networks are frequently used to statistically extract biological interactions between omics markers

  • Correlation network inference often relies on correlation cutoffs based on p-values, which are known to be substantially affected by sample size and are subject to an arbitrary choice of significance level and multiple testing correction procedures

  • We showed that an exception to this general observation is GeneNet[11], which exhibits remarkable robustness to sample size, but is still subject to choice of a proper statistical cutoff

Read more

Summary

Results

Statistical correlation cutoffs depend on a sample size. For most correlation measures, the larger the sample size, the lower the resulting correlation cutoff at a given significance level. 3 and 4, respectively), the overall performance was lower than GeneNet. In conclusion, using prior information to optimize the correlation cutoff allowed to infer the same optimal network regardless of the sample size of the data set. When comparing the optimization results carried out starting from these biological references to that of the full biochemical pathway (adjacency matrix 3 in Fig. 4c), we observe that, while the overall performance varies, the optimal values are close to each other, producing similar networks. To showcase how these overlap differences between optimized and statistical networks affect the inferred networks, we visualized the partial correlation network obtained with our optimization. The analysis demonstrated that the optimized networks are superior to statistical cutoff-based networks

Discussion
Methods
Findings
Code availability
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call