Abstract

The identification of biomarker signatures is important for cancer diagnosis and prognosis. However, the detection of clinical reliable signatures is influenced by limited data availability, which may restrict statistical power. Moreover, methods for integration of large sample cohorts and signature identification are limited. We present a step-by-step computational protocol for functional gene expression analysis and the identification of diagnostic and prognostic signatures by combining meta-analysis with machine learning and survival analysis. The novelty of the toolbox lies in its all-in-one functionality, generic design, and modularity. It is exemplified for lung cancer, including a comprehensive evaluation using different validation strategies. However, the protocol is not restricted to specific disease types and can therefore be used by a broad community. The accompanying R package vignette runs in ~1 h and describes the workflow in detail for use by researchers with limited bioinformatics training.

Highlights

  • The combination of biomarkers allows us to represent the information contained in biological samples and fluids, supporting clinical decisions [1].Numerous studies demonstrated the clinical usefulness of diagnostic and prognostic gene-expression signatures derived from microarray analysis [2,3]

  • Starting from this, we developed a protocol for the systematical calculation of diagnostic and prognostic gene signatures that combines (i) meta-analysis with (ii) functional gene expression analysis and (iii) Machine learning (ML) approaches

  • Meta-Analysis

Read more

Summary

Introduction

The combination of biomarkers (so-called biomarker signature) allows us to represent the information contained in biological samples and fluids, supporting clinical decisions [1]. Numerous studies demonstrated the clinical usefulness of diagnostic (disease detection) and prognostic (disease outcome) gene-expression signatures derived from microarray analysis [2,3]. MammaPrint is a 70 gene-expression prognostic signature for powerful disease outcome prediction in breast cancer [4]. Reliable clinical signatures are restricted by dataset availability, which often reduces their statistical power [3,6]. Increasing the number of samples by combining different large cohorts using dataset merging (meta-analysis) is a beneficial solution enabling numerous insights into biological systems [7,8,9,10], but methods for biomarker signature identification are currently limited

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call