Multiomics: A user-friendly multi-omics data harmonisation R pipeline

Tyrone Chen,Sonika Tyagi,Kim-Anh Lê Cao,Al J Abadi

doi:10.12688/f1000research.53453.1

Abstract

Data from multiple omics layers of a biological system is growing in quantity, heterogeneity and dimensionality. Simultaneous multi-omics data integration is a growing field of research as it has strong potential to unlock information on previously hidden biological relationships leading to early diagnosis, prognosis and expedited treatments. Many tools for multi-omics data integration are being developed. However, these tools are often restricted to highly specific experimental designs, and types of omics data. While some general methods do exist, they require specific data formats and experimental conditions. A major limitation in the field is a lack of a single or multi-omics pipeline which can accept data in an unrefined, information-rich form pre-integration and subsequently generate output for further investigation. There is an increasing demand for a generic multi-omics pipeline to facilitate general-purpose data exploration and analysis of heterogeneous data. Therefore, we present our R multiomics pipeline as an easy to use and flexible pipeline that takes unrefined multi-omics data as input, sample information and user-specified parameters to generate a list of output plots and data tables for quality control and downstream analysis. We have demonstrated application of the pipeline on two separate COVID-19 case studies. We enabled limited checkpointing where intermediate output is staged to allow continuation after errors or interruptions in the pipeline and generate a script for reproducing the analysis to improve reproducibility. A seamless integration with the mixOmics R package is achieved, as the R data object can be loaded and manipulated with mixOmics functions. Our pipeline can be installed as an R package or from the git repository, and is accompanied by detailed documentation with walkthroughs on two case studies. The pipeline is also available as Docker and Singularity containers.

Highlights

A biological phenotype is an emergent property of a complex network of biological interactions
Since relying on a single layer of omics data to test a biological hypothesis results in an incomplete perspective of a biological system, interest in multi-omics data integration is steadily increasing as a means to decipher complex biological phenotypes
We present a pipeline targeted at bioinformaticians called multiomics13 with some important features, implementing one of the state of the art tools in data harmonisation from the mixOmics R package

Summary

Introduction

A biological phenotype is an emergent property of a complex network of biological interactions. Since relying on a single layer of omics data to test a biological hypothesis results in an incomplete perspective of a biological system, interest in multi-omics data integration is steadily increasing as a means to decipher complex biological phenotypes.1 We illustrate these points with a hypothetical case of measuring protein and transcript levels in a same set of matched samples. Complete and detailed examples of input data format are provided, including a sample dataset which can be loaded directly from the R package In this manuscript, we summarise these information and show a minimum working example to highlight some of the features of our pipeline

Methods

Results and interpretation

18. Kurtzer GM