Abstract

Discovery of robust diagnostic or prognostic biomarkers is a key to optimizing therapeutic benefit for select patient cohorts - an idea commonly referred to as precision medicine. Most discovery studies to derive such markers from high-dimensional transcriptomics datasets are weakly powered with sample sizes in the tens of patients. Therefore, highly regularized statistical approaches are essential to making generalizable predictions. At the same time, prior knowledge-driven approaches have been successfully applied to the manual interpretation of high-dimensional transcriptomics datasets. In this work, we assess the impact of combining two orthogonal approaches for the discovery of biomarker signatures, namely (1) well-known lasso-based regression approaches and its more recent derivative, the group lasso, and (2) the discovery of significant upstream regulators in literature-derived biological networks. Our method integrates both approaches in a weighted group-lasso model and differentially weights gene sets based on inferred active regulatory mechanism. Using nested cross-validation as well as independent clinical datasets, we demonstrate that our approach leads to increased accuracy and generalizable results. We implement our approach in a computationally efficient, user-friendly R package called creNET. The package can be downloaded at https://github.com/kouroshz/creNethttps://github.com/kouroshz/creNet and is accompanied by a parsed version of the STRING DB data base.

Highlights

  • Discovery of robust diagnostic or prognostic biomarkers is a key to optimizing therapeutic benefit for select patient cohorts - an idea commonly referred to as precision medicine[1]

  • Most discovery studies to derive markers from high-dimensional transcriptomics datasets are weakly powered with sample sizes in the tens of patients

  • Even clinical trials with accompanying omics measurements usually enlist far fewer than 100 patients per study arm

Read more

Summary

Introduction

Discovery of robust diagnostic or prognostic biomarkers is a key to optimizing therapeutic benefit for select patient cohorts - an idea commonly referred to as precision medicine[1]. Given the high dimensionality of transcriptomics data, regularized statistical approaches are essential to making robust predictions[3] Regularized classification methods such as 1 regularized logistic regression ( known as lasso)[4] or its variations[5,6] are very popular machine-learning techniques for addressing the high dimensionality of the datasets. In[10], the authors introduced a variation of the group-lasso norm, called ‘overlap-norm’ to extend the method to cases when there is overlap between groups Another approach to avoid overfitting consists in incorporating prior biological knowledge into the classification procedure. Gene set and pathway enrichment methods[12] are a mainstay of expert-driven analysis of transcriptomics data Commercial tools such as Qiagen’s IPA13 are used to discover biological insights in numerous biomedical publications (https://www.qiagenbioinformatics.com/). We presented an improved method for discovering upstream regulators based on a generalization of Fisher’s exact test that works well with a given mixed causal/non-causal gene regulatory network and a set of differentially expressed genes[16]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call