Abstract

Network inference is a valuable approach for gaining mechanistic insight from high-dimensional biological data. Existing methods for network inference focus on ranking all possible relations (edges) among all measured quantities such as genes, proteins, metabolites (features) observed, which yields a dense network that is challenging to interpret. Identifying a sparse, interpretable network using these methods thus requires an error-prone thresholding step which compromises their performance. In this article we propose a new method, DEKER-NET, that addresses this limitation by directly identifying a sparse, interpretable network without thresholding, improving real-world performance. DEKER-NET uses a novel machine learning method for feature selection in an iterative framework for network inference. DEKER-NET is extremely flexible, handling linear and nonlinear relations while making no assumptions about the underlying distribution of data, and is suitable for categorical or continuous variables. We test our method on the Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge data, demonstrating that it can directly identify sparse, interpretable networks without thresholding while maintaining performance comparable to the hypothetical best-case thresholded network of other methods.

Highlights

  • In recent years the availability of high-dimensional biological data has rapidly outpaced the tools available to understand it

  • Alongside DEKER-NET, we test ‘Correlation’ network inference, which uses the absolute value of the Pearson correlation coefficient to weight edges, and two of the highest performing network inference methods, TIGRESS [10] and GENIE3 [16]

  • Prior to DEKER-NET, network inference methods were primarily limited by their focus on weighting and ranking all possible relationships, which produces a dense, uninterpretable network structure

Read more

Summary

Introduction

In recent years the availability of high-dimensional biological data has rapidly outpaced the tools available to understand it. We describe an algorithm for analyzing such data using an embedded feature selection method to infer networks of mechanistic biological relationships. The best performing network inference methods have a significant liability, which we address with our new approach These methods, such as GENIE3 (the best performing method in the ‘Dialogue for Reverse Engineering Assessments and Methods’ initiative’s challenges), do not explicitly infer networks but rather rank all possible relationships. We refer to these methods as ‘weight-focused.’. Our method uses a different approach to feature selection and network inference to avoid this handtuned thresholding step, instead identifying the subset of relationships to form a network directly

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call