Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery

Xin Guan,Li Liu,George Runger

doi:10.1186/s12859-020-3344-x

Xin Guan, Li Liu + Show 1 more

Open Access

https://doi.org/10.1186/s12859-020-3344-x

Copy DOI

Abstract

BackgroundIn biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. These methods often require that prior information is encoded as a single score and the algorithms are optimized for biological knowledge of a specific type. However, in practice, domain knowledge from diverse resources can provide complementary information. But no current methods can integrate heterogeneous prior information for biomarker discovery. To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection.ResultsKnow-GRRF embeds domain knowledge in a regularized random forest framework. It combines prior information from multiple domains in a linear model to derive a composite score, which, together with other tuning parameters, controls the regularization of the random forests model. Know-GRRF concurrently optimizes the weight given to each type of domain knowledge and other tuning parameters to minimize the AIC of out-of-bag predictions. The objective is to select a compact feature subset that has a high discriminative power and strong functional relevance to the biological phenotype.Via rigorous simulations, we show that Know-GRRF guided by multiple-domain prior information outperforms feature selection methods guided by single-domain prior information or no prior information. We then applied Known-GRRF to a real-world study to identify prognostic biomarkers of prostate cancers. We evaluated the combination of cancer-related gene annotations, evolutionary conservation and pre-computed statistical scores as the prior knowledge to assemble a panel of biomarkers. We discovered a compact set of biomarkers with significant improvements on prediction accuracies.ConclusionsKnow-GRRF is a powerful novel method to incorporate knowledge from multiple domains for feature selection. It has a broad range of applications in biomarker discoveries. We implemented this method and released a KnowGRRF package in the R/CRAN archive.

Highlights

In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures
We developed the know-guided regularized random forest (Know-Guided Regularized Random Forest (GRRF)) algorithm that is a generalized form of regularized random forests (RRF) to enable the incorporation of prior information in feature selection [13]
We demonstrated that integrating multiple prior information using Know-GRRF significantly improves feature selection accuracies

Summary

Introduction

In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. No current methods can integrate heterogeneous prior information for biomarker discovery To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection. Biomarker discovery aims to identify a concise molecular signature of a biological phenotype from among a large number of features To facilitate this process, datadriven feature selection methods have been widely employed that prioritize features based on their discriminative power. Park et al proposed a l1-regularized linear regression model that prioritizes cancer genes showing dependence of copy number alterations on expression levels [12] These methods perform well in specific domains, the feasibility of using these methods to incorporate knowledge from other domains remains unclear. Given the availability of diverse functional annotations, a generalizable approach that can evaluate domain knowledge from heterogeneous resources and automatically determine the optimal combination for guided feature selection is highly desirable

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 1, 2020
Citations: 16	License type: open-access

R Discovery Prime

R Discovery Prime

Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Know-GRRF: Domain-Knowledge Informed Biomarker Discovery with Random Forests
Xin Guan ... Li Liu
-
Xin Guan, et. al.Xin Guan ... Li Liu
01 Jan 2018
01 Jan 2018

Biomarker discovery in inflammatory bowel diseases using network-based feature selection.
Mostafa Abbas ... Stavros I Dimitriadis
PLOS ONE | VOL. 14
Mostafa Abbas, et. al.Mostafa Abbas ... Stavros I Dimitriadis
22 Nov 2019
PLOS ONE | VOL. 14

Striving towards excellence in research on biomarkers.
Deepak Malviya ... Sukhminderjit Singh Bajwa
Indian journal of anaesthesia | VOL. 66
Deepak Malviya, et. al.Deepak Malviya ... Sukhminderjit Singh Bajwa
01 Jan 2021
Indian journal of anaesthesia | VOL. 66

Integrating Prior Information with Bayesian Feature Selection
Ali Foroughi Pour ... Lori A Dalton
-
Ali Foroughi Pour, et. al.Ali Foroughi Pour ... Lori A Dalton
20 Aug 2017
20 Aug 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics