Abstract

Data integration has been proven to provide valuable information. The information extracted using data integration in the form of multiblock analysis can pinpoint both common and unique trends in the different blocks. When working with small multiblock datasets the number of possible integration methods is drastically reduced. To investigate the application of multiblock analysis in cases where one has a few number of samples and a lack of statistical power, we studied a small metabolomic multiblock dataset containing six blocks (i.e., tissue types), only including common metabolites. We used a single model multiblock analysis method called the joint and unique multiblock analysis (JUMBA) and compared it to a commonly used method, concatenated principal component analysis (PCA). These methods were used to detect trends in the dataset and identify underlying factors responsible for metabolic variations. Using JUMBA, we were able to interpret the extracted components and link them to relevant biological properties. JUMBA shows how the observations are related to one another, the stability of these relationships, and to what extent each of the blocks contribute to the components. These results indicate that multiblock methods can be useful even with a small number of samples.

Highlights

  • The idea behind data integration is that the combination of datasets is “more than the sum of its parts”, since it does contain the information of the respective blocks and information on their inter-relations [1]

  • The analysis shows that the multiblock analysis can be used to provide further insights to the metabolic trends, even in cases with a low number of samples

  • By inspecting the joint and unique multiblock analysis (JUMBA) p3 loadings we found that contribution from see theTable kidney, The metadata correlation matrix plot revealed that globally jointsyndrome t2 correlated strongly with the mice the maturity onset diabetes of the young

Read more

Summary

Introduction

The idea behind data integration (i.e., combining data from different sources) is that the combination of datasets is “more than the sum of its parts”, since it does contain the information of the respective blocks (where we by blocks mean a data matrix containing measured observations or variables from one source) and information on their inter-relations [1]. When several data blocks are available it can be of interest to identify common variation, i.e., to integrate the data. The basis for data integration is that there is a flow of information from one block to the next. More complex integration methods have been developed [3] such as network analysis, correlation-based analysis [4,5,6,7,8], matrix factorization methods [9,10,11,12,13,14,15,16,17], and Bayesian methods [18,19,20,21,22]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.