Integrating functional genomics data using maximum likelihood based simultaneous component analysis

Robert A Van Den Berg,Katrijn Van Deun,Iven Van Mechelen,Henk Al Kiers,Tom F Wilderjans,Age K Smilde

doi:10.1186/1471-2105-10-340

Robert A Van Den Berg, Katrijn Van Deun + Show 4 more

Open Access

https://doi.org/10.1186/1471-2105-10-340

Copy DOI

Abstract

BackgroundIn contemporary biology, complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms. Such data comprise a number of data blocks that are coupled via a common mode. The goal of collecting this type of data is to discover biological mechanisms that underlie the behavior of the variables in the different data blocks. The simultaneous component analysis (SCA) family of data analysis methods is suited for this task. However, a SCA may be hampered by the data blocks being subjected to different amounts of measurement error, or noise. To unveil the true mechanisms underlying the data, it could be fruitful to take noise heterogeneity into consideration in the data analysis. Maximum likelihood based SCA (MxLSCA-P) was developed for this purpose. In a previous simulation study it outperformed normal SCA-P. This previous study, however, did not mimic in many respects typical functional genomics data sets, such as, data blocks coupled via the experimental mode, more variables than experimental units, and medium to high correlations between variables. Here, we present a new simulation study in which the usefulness of MxLSCA-P compared to ordinary SCA-P is evaluated within a typical functional genomics setting. Subsequently, the performance of the two methods is evaluated by analysis of a real life Escherichia coli metabolomics data set.ResultsIn the simulation study, MxLSCA-P outperforms SCA-P in terms of recovery of the true underlying scores of the common mode and of the true values underlying the data entries. MxLSCA-P further performed especially better when the simulated data blocks were subject to different noise levels. In the analysis of an E. coli metabolomics data set, MxLSCA-P provided a slightly better and more consistent interpretation.ConclusionMxLSCA-P is a promising addition to the SCA family. The analysis of coupled functional genomics data blocks could benefit from its ability to take different noise levels per data block into consideration and improve the recovery of the true patterns underlying the data. Moreover, the maximum likelihood based approach underlying MxLSCA-P could be extended to custom-made solutions to specific problems encountered.

Highlights

In contemporary biology, complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms
In the study presented in this manuscript the previous study was extended to address these problems typical for functional genomics: (i) the data were coupled via the experimental mode, (ii) the simulations were based on correlation structures observed in real life data sets, (iii) collinearity was induced by ensuring the data had more variables than objects
Our results showed that MxLSCA-P outperforms simultaneous component analysis (SCA)-P in simulated data that mimic functional genomics data more closely

Summary

Introduction

Complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms Such data comprise a number of data blocks that are coupled via a common mode. It becomes more widespread to study complex biological processes by collecting and analyzing measurements on the same entities from different sources, such as transcriptomics, metabolomics, ChIPchip, or proteomics. The experimental units, referred to as objects, constitute the experimental mode of the data, and the measured biochemical compounds the variable mode We will denote such matrices consisting of measurements originating from different sources by data blocks. Ishii and coworkers [1] simultaneously collected metabolomics, transcriptomics, and proteomics measurements from Escherichia coli chemostat cultures with different mutants and environmental conditions This yields measurements coupled via the experimental mode. This occurs, for instance, in experiments in which transcriptomics measurements are coupled with ChIP-chip measurements [4], or even with ChIP-chip and motif data [5]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Oct 16, 2009
Citations: 43	License type: cc-by

R Discovery Prime

R Discovery Prime

Integrating functional genomics data using maximum likelihood based simultaneous component analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data.
Kim De Roover ... Eva Ceulemans
Psychological Methods | VOL. 17
Kim De Roover, et. al.Kim De Roover ... Eva Ceulemans
01 Mar 2012
Psychological Methods | VOL. 17

Abstract 2282: VIP: A system biology platform for cell line centric integrative analysis of molecular and functional genomics data
Julio Fernandez ... Zhengyan Kan
Cancer Research | VOL. 78
Julio Fernandez, et. al.Julio Fernandez ... Zhengyan Kan
01 Jul 2018
Cancer Research | VOL. 78

Simultaneous analysis of coupled data matrices subject to different amounts of noise
Tom F Wilderjans ... Eva Ceulemans
British Journal of Mathematical and Statistical Psychology | VOL. 64
Tom F Wilderjans, et. al.Tom F Wilderjans ... Eva Ceulemans
15 Apr 2011
British Journal of Mathematical and Statistical Psychology | VOL. 64

Common and distinct variation in data fusion of designed experimental data
Masoumeh Alinaghi ... Age K Smilde
Metabolomics | VOL. 16
Masoumeh Alinaghi, et. al.Masoumeh Alinaghi ... Age K Smilde
03 Dec 2019
Metabolomics | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating functional genomics data using maximum likelihood based simultaneous component analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics