Abstract

Mass spectrometry technologies are widely used in the fields of ionomics and metabolomics to simultaneously profile the intracellular concentrations of, e.g., amino acids or elements in genome-wide mutant libraries. These molecular or sub-molecular features are generally non-Gaussian and their covariance reveals patterns of correlations that reflect the system nature of the cell biochemistry and biology. Here, we introduce two similarity measures, the Mahalanobis cosine and the hybrid Mahalanobis cosine, that enforce information from the empirical covariance matrix of omics data from high-throughput screening and that can be used to quantify similarities between the profiled features of different mutants. We evaluate the performance of these similarity measures in the task of inferring and integrating genetic networks from short-profile ionomics/metabolomics data through an analysis of experimental data sets related to the ionome and the metabolome of the model organism S. cerevisiae. The study of the resulting ionome–metabolome Saccharomyces cerevisiae multilayer genetic network, which encodes multiple omic-specific levels of correlations between genes, shows that the proposed measures can provide an alternative description of relations between biological processes when compared to the commonly used Pearson’s correlation coefficient and have the potential to guide the construction of novel hypotheses on the function of uncharacterised genes.

Highlights

  • The development and reduction in cost of high-throughput technologies in the post-genomic era has made possible genome-wide screening experiments that measure the molecular phenotypes observed in response to single gene alterations, such as deletion, or as a result of an increase in expression of the protein coding sequence [1,2,3,4]

  • The final mutant-related feature profiles show how many standard deviations the concentrations deviate from the median concentration measured across all the strains in that data set. These data sets have been obtained using a similar experimental design, and they present the general characteristic of short-profile omic data sets discussed in Section 2.1: in Table 1, we report the average absolute feature skewness (AAFS) and the number-of-features over number-of-genes ratio M/N; in Figure 1, we show the patterns of feature–feature correlations extracted from each experimental data set using the Pearson correlation coefficient

  • We have focused on the problem of inferring and integrating association networks between genes from omic data sets containing a relatively small number (order O(10)) of biological signatures, profiled for almost all single non-essential gene mutants. These signatures contain comprehensive information on the intracellular concentration of elements or of classes of metabolites, and they present patterns of correlations that reflect those biological and biochemical processes inside the cell in which these concentrations play a fundamental role. The importance of these omic data lies in the fact that the associated studies and methodologies have been proposed as functional omic approaches alternative to the classic functional genomics that can reveal undiscovered relations between genes encoded in the specific omic-related signatures

Read more

Summary

Introduction

The development and reduction in cost of high-throughput technologies in the post-genomic era has made possible genome-wide screening experiments that measure the molecular phenotypes observed in response to single gene alterations, such as deletion, or as a result of an increase in expression of the protein coding sequence [1,2,3,4]. Because these molecular or sub-molecular signatures can be mapped and associated to a consistent region of the genome, statistical inference techniques are often applied to extract genetic networks from correlations that will reflect the interplay between gene function, molecular signatures, and environmental factors. Starting from theoretical considerations on the characteristic structure of short-profile omic data, we develop and apply a methodology to quantify advantages these measures may have in the task of extracting biologically meaningful genetic networks. We do this by considering three experimental benchmark data sets of the ionome and the metabolome of the model organism

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call