Abstract

Biomarker discovery aims to find biomarkers involved in biological mechanisms of a disease under study that can be further utilized for diagnosis, prognosis, drug development, etc. Although current high throughput technologies provide a deluge of data per point, research is usually constrained to small samples, impeding reliable biomarker discovery. Given the ongoing research on biomarker discovery during the past decades there exists an incredibly useful, but still limited, prior knowledge on cancer biology, such as small gene sets already known to be involved in cancer. This information, if properly integrated with feature selection, could potentially help to detect new biomarkers. However, most current methods used for biomarker discovery cannot easily be extended to account for such prior knowledge. Recent work proposes a hierarchical Bayesian framework for feature selection which places priors on both the identity of all features and identity-conditioned feature distribution. Various models are obtained based on this framework, including dependent good dependent bad (DGDB) model. An approximate solution of DGDB has been used with a set selection heuristic to successfully find genes involved in colon cancer and multiple sclerosis. While the approximate solution only relies on training data, we propose a new algorithm that takes advantage of previously known biomarkers to find additional biomarkers, hereafter called Informed Approximate 3MNC-DGDB (IA-3MNC). In three synthetic simulations we illustrate (a) IA-3MNC outperforms many popular feature selection algorithms, and (b) prior knowledge helps to correctly detect additional biomarkers, particularly under small samples. We apply IA-3MNC to colon cancer and breast cancer datasets deposited on gene expression omnibus with accession numbers GSE1456 and GSE41850, respectively. Studying top 20 selected genes of IA-3MNC and top 10 enriched pathways we find many of highly ranked genes and pathways are suggested to be involved in cancer.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call