CAncer bioMarker Prediction Pipeline (CAMPP)-A standardized framework for the analysis of quantitative biological data.

Thilde Terkelsen,Elena Papaleo,Anders Krogh,Mihaela Pertea

doi:10.1371/journal.pcbi.1007665

Abstract

With the improvement of -omics and next-generation sequencing (NGS) methodologies, along with the lowered cost of generating these types of data, the analysis of high-throughput biological data has become standard both for forming and testing biomedical hypotheses. Our knowledge of how to normalize datasets to remove latent undesirable variances has grown extensively, making for standardized data that are easily compared between studies. Here we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper (https://github.com/ELELAB/CAncer-bioMarker-Prediction-Pipeline -CAMPP) intended to aid bioinformatic software-users with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. To avoid issues relating to R-package updates, a renv .lock file is provided to ensure R-package stability. Data-management includes missing value imputation, data normalization, and distributional checks. CAMPP performs (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis, and (VI) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist in streamlining bioinformatic analysis of quantitative biological data, whilst ensuring an appropriate bio-statistical framework.

Highlights

The availability of sensitive and specific biomarkers for disease diagnosis, prognosis, and monitoring, is an attractive alternative to many of the current methods in use
The CAncer bioMarker Prediction Pipeline (CAMPP) pipeline supports different types of analysis and will provide the user with graphics to support results—all plots displayed in this publication were generated with CAMPP with no or very minimal editing
CAMPP is implemented in such a way that a user is able to run the pipeline on their local computer on datasets with up to 300–500 samples and 40.000 variables

Summary

Introduction

The availability of sensitive and specific biomarkers for disease diagnosis, prognosis, and monitoring, is an attractive alternative to many of the current methods in use. The presence and levels of certain tissue-derived molecular markers can help distinguish subtypes in heterogeneous diseases such as cancer [1,2]. Biomarkers may be predictive of patient outcome and responsiveness to treatment [3,4]. CAMPP - A standardized framework for the analysis of biological data

Methods

Results

Conclusion