Abstract
Constantly decreasing costs of high-throughput profiling on many molecular levels generate vast amounts of multi-omics data. Studying one biomedical question on two or more omic levels provides deeper insights into underlying molecular processes or disease pathophysiology. For the majority of multi-omics data projects, the data analysis is performed level-wise, followed by a combined interpretation of results. Hence the full potential of integrated data analysis is not leveraged yet, presumably due to the complexity of the data and the lacking toolsets. We propose a versatile approach, to perform a multi-level fully integrated analysis: The Knowledge guIded Multi-Omics Network inference approach, KiMONo (https://github.com/cellmapslab/kimono). KiMONo performs network inference by using statistical models for combining omics measurements coupled to a powerful knowledge-guided strategy exploiting prior information from existing biological sources. Within the resulting multimodal network, nodes represent features of all input types e.g. variants and genes while edges refer to knowledge-supported and statistically derived associations. In a comprehensive evaluation, we show that our method is robust to noise and exemplify the general applicability to the full spectrum of multi-omics data, demonstrating that KiMONo is a powerful approach towards leveraging the full potential of data sets for detecting biomarker candidates.
Highlights
Decreasing costs of high-throughput profiling on many molecular levels generate vast amounts of multi-omics data
We presented KiMONo—a novel prior Knowledge guided Multi-Omics Network inference method
The algorithm builds a statistical model for each gene, selects the most predictive features and uses these to assemble a multi-level network
Summary
Decreasing costs of high-throughput profiling on many molecular levels generate vast amounts of multi-omics data. More sophisticated latent factor-based models have been introduced, capable of analysing multiple omic levels simultaneously[4,5] These methods infer lower-dimensional representations (latent factors) of the original high dimensional multi-omic data space. Improved interpretability is one of the big advantages of network based approaches These identify condition specific key molecules via inferring and analysing a network representation of the processes[6,7]. To increase the specificity one can use more advanced machine learning approaches, instead of correlation, to identify associations between nodes[9,10] These methods are only applicable to high dimensional multi-omic data with large amounts of samples. MiRlastic facilitates prior knowledge to increase the performance for high dimensional and low sample size data analysis
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have