A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data

Audrey Hulot,Denis Laloë,Florence Jaffrézic

doi:10.1186/s12859-021-04303-4

Abstract

BackgroundIntegrating data from different sources is a recurring question in computational biology. Much effort has been devoted to the integration of data sets of the same type, typically multiple numerical data tables. However, data types are generally heterogeneous: it is a common place to gather data in the form of trees, networks or factorial maps, as these representations all have an appealing visual interpretation that helps to study grouping patterns and interactions between entities. The question we aim to answer in this paper is that of the integration of such representations.ResultsTo this end, we provide a simple procedure to compare data with various types, in particular trees or networks, that relies essentially on two steps: the first step projects the representations into a common coordinate system; the second step then uses a multi-table integration approach to compare the projected data. We rely on efficient and well-known methodologies for each step: the projection step is achieved by retrieving a distance matrix for each representation form and then applying multidimensional scaling to provide a new set of coordinates from all the pairwise distances. The integration step is then achieved by applying a multiple factor analysis to the multiple tables of the new coordinates. This procedure provides tools to integrate and compare data available, for instance, as tree or network structures. Our approach is complementary to kernel methods, traditionally used to answer the same question.ConclusionOur approach is evaluated on simulation and used to analyze two real-world data sets: first, we compare several clusterings for different cell-types obtained from a transcriptomics single-cell data set in mouse embryos; second, we use our procedure to aggregate a multi-table data set from the TCGA breast cancer database, in order to compare several protein networks inferred for different breast cancer subtypes.

Highlights

Integrating data from different sources is a recurring question in computational biology
Step 1 is done by retrieving a distance matrix specific to either trees or networks and applying multidimensional scaling (MDS), which provides a new set of coordinates from all these pairwise distances
To assess the differences between clusterings, we used the Adjusted Rand Index (ARI) [38, 39] from the aricode R-package [40], which measures the agreement between two classifications

Summary

Introduction

Integrating data from different sources is a recurring question in computational biology. When integrating data in computational biology, we are often confronted with the problem of comparing outcomes from different types of data, with various forms of representations [1,2,3]. As a simple example in genomics, several hierarchical clusterings of individuals can be obtained based on transcriptomics, proteomics or metagenomics experiments, giving birth to several tree-like representations which need to be compared and eventually aggregated. Such an analysis is essential to better understand the data and to obtain a consensus clustering from coherent trees. Most methods developed for this purpose and described in [4] involve similar objects for integration in practice

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Aug 4, 2021
Citations: 5	License type: open-access

R Discovery Prime

R Discovery Prime

A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Equivalent delocalized internal coordinates
J Ulises Reveles ... Andreas M Köster
Journal of Molecular Structure: THEOCHEM | VOL. 762
J Ulises Reveles, et. al.J Ulises Reveles ... Andreas M Köster
03 Feb 2006
Journal of Molecular Structure: THEOCHEM | VOL. 762

<title>Color image enhancement in a new color space</title>
Tian-Hu Yu
-
Tian-Hu YuTian-Hu Yu
27 Feb 1996
27 Feb 1996

New coordinates for the amplitude parameter space of continuous gravitational waves
John T Whelan ... Reinhard Prix
Classical and Quantum Gravity | VOL. 31
John T Whelan, et. al.John T Whelan ... Reinhard Prix
24 Feb 2014
Classical and Quantum Gravity | VOL. 31

The importance of slope correction for studying Greenland ice change using radar altimetry (CryoSat-2)
Katarzyna Sejan ... Michiel Van Den Broeke
-
Katarzyna Sejan, et. al.Katarzyna Sejan ... Michiel Van Den Broeke
23 Mar 2020
23 Mar 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics