Abstract

The integration of multiple omics datasets measured on the same samples is a challenging task: data come from heterogeneous sources and vary in signal quality. In addition, some omics data are inherently compositional, e.g. sequence count data. Most integrative methods are limited in their ability to handle covariates, missing values, compositional structure and heteroscedasticity. In this article we introduce a flexible model-based approach to data integration to address these current limitations: COMBI. We combine concepts, such as compositional biplots and log-ratio link functions with latent variable models, and propose an attractive visualization through multiplots to improve interpretation. Using real data examples and simulations, we illustrate and compare our method with other data integration techniques. Our algorithm is available in the R-package combi.

Highlights

  • With the latest advances in high-throughput technologies, an increasing number of omics data types is arising that require statistical analysis and data integration tools

  • The underlying idea is that their common origin engenders some relationship between the measurements, i.e. the biological state of the organism is reflected from the different views

  • We adopt an explorative approach to unearth patterns that extend across different datasets, and identify relationships between features from different views

Read more

Summary

Introduction

With the latest advances in high-throughput technologies, an increasing number of omics data types is arising that require statistical analysis and data integration tools. These tools must be tailored to the data types under study, whilst being user-friendly with fast computation and interpretable results. The underlying idea is that their common origin engenders some relationship between the measurements, i.e. the biological state of the organism is reflected from the different views. The goals of such data integration can be very diverse. Using dimension reduction and visualization that focus on the strongest biological patterns in several high-dimensional datasets, our aim is to give researchers a first insight into the data structure and to highlight sample clusters and feature relationships that can be further investigated in follow-up studies

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call