Abstract

Metabolic phenotyping technologies based on Nuclear Magnetic Spectroscopy (NMR) and Mass Spectrometry (MS) generate vast amounts of unrefined data from biological samples. Clustering strategies are frequently employed to provide insight into patterns of relationships between samples and metabolites. Here, we propose the use of a non-negative matrix factorization driven bi-clustering strategy for metabolic phenotyping data in order to discover subsets of interrelated metabolites that exhibit similar behaviour across subsets of samples. The proposed strategy incorporates bi-cross validation and statistical segmentation techniques to automatically determine the number and structure of bi-clusters. This alternative approach is in contrast to the widely used conventional clustering approaches that incorporate all molecular peaks for clustering in metabolic studies and require a priori specification of the number of clusters. We perform the comparative analysis of the proposed strategy with other bi-clustering approaches, which were developed in the context of genomics and transcriptomics research. We demonstrate the superior performance of the proposed bi-clustering strategy on both simulated (NMR) and real (MS) bacterial metabolic data.

Highlights

  • Modern Nuclear Magnetic Resonance (NMR) spectroscopy and Mass Spectrometry (MS) technologies generate vast amounts of unrefined metabolic data in biomedical studies [1,2]

  • We explored the bicluster model on the synthetic datasets

  • The value in each bilcuster is set as b 2 f1; 2; . . . ; 5g, the typical value is set as the l 1⁄4 minðjXijjÞ the noise level is set as d 2 1⁄20; 1Š

Read more

Summary

Introduction

Modern Nuclear Magnetic Resonance (NMR) spectroscopy and Mass Spectrometry (MS) technologies generate vast amounts of unrefined metabolic data in biomedical studies [1,2]. The metabolic signature of a complex biological mixture (‘metabolic profile’), such as that obtained from analysis of biofluids, consists of overlapping signals of hundreds to thousands of distinct chemical entities influenced by genes, treatment, gut microbiota and other environmental factors. This myriad of factorial influences results in complex inter-relationships between both spectral observations and variables. Given a two-dimensional data matrix X with m rows (samples) and n columns (variables), traditional clustering analysis aims to identify groups of samples (or respectively variables) that exhibit similar behaviour across all variables (or respectively samples). In ‘‘-omics” studies, molecules (e.g., genes or metabolites) can be involved in one or more biological processes and exhibit similar patterns of behaviour across a subset of samples

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call