Abstract

With the advent of high-throughput technologies for data acquisition from different components (i.e., genes, proteins, and metabolites) of a given biological system, generation of hypotheses, and biological interpretations based on multivariate data sets become increasingly important. These technologies allow for simultaneous gathering of data from the same biological components under different perturbations, including genotypic variation and/or changes in conditions, resulting in so-called multiple data tables. Moreover, these data tables are obtained over a well-chosen time domain to capture the dynamics of the response of the biological system to the perturbation. The computational problem we address in this study is twofold: (1) derive a single data table, referred to as a compromise, which captures information common to the investigated set of multiple tables and (2) identify biological components which contribute most to the determined compromise. Here we argue that recent extensions to principle component analysis called STATIS and dual-STATIS can be used to determine the compromise on which classical techniques for data analysis, such as clustering and term over-enrichment, can be subsequently applied. In addition, we illustrate that STATIS and dual-STATIS facilitate interpretations of a publically available transcriptomics data set capturing the time-resolved response of Arabidopsis thaliana to changing light and/or temperature conditions. We demonstrate that STATIS and dual-STATIS can be used not only to identify the components of a biological system whose behavior is similarly affected due to the perturbation (e.g., in time or condition), but also to specify the extent to which each dimension of the data tables reflect the perturbation. These findings ultimately provide insights in the components and pathways which could be under tight control in plant systems.

Highlights

  • High-throughput technologies are routinely applied to obtain a snapshot of plant systems operating under a given environmental condition

  • The K = 8 tables correspond to the eight environmental conditions obtained from the Arabidopsis experiment described in Section “Materials and Methods,” where the variables denote the 23 time points measured for each condition and the observations correspond to the 2,276 genes which are obtained using the aforementioned filtering strategy

  • This indicates that the transcriptomics changes under normal temperature regimes coupled with darkness/low-light conditions are characteristic for the entire data set, superimposing the changes in the remaining combinations of conditions

Read more

Summary

Introduction

High-throughput technologies are routinely applied to obtain a snapshot of plant systems operating under a given environmental condition. The resulting multivariate data sets gathered from the same set of biological entities (e.g., genes) under various conditions require the development of methods for simultaneous analysis of multiple data sets (or data tables). The goal of this study is to introduce STATIS and dual-STATIS in the analysis and interpretation of transcriptomics data over the same set of genes under varying but not necessarily independent environmental conditions sampled at same time points starting from a well-defined reference. The idea of STATIS and dual-STATIS is based on integrating a given set of data tables into an optimum weighted average, called a compromise, which captures what is common to all or a subset of analyzed tables. Since we consider the case where no supervised information is available about the gene labels, it is possible to apply classical unsupervised learning techniques to the resulting compromise. The approach presented in this study may be regarded as an instance of the multi-way unsupervised learning problem which requires decomposition of a multidimensional table (Geladi, 1989)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call