Weighted Correlation Network Analysis (WGCNA) Applied to the Tomato Fruit Metabolome

Matthew V Dileo,Owen A Hoekenga,Meghan Den Bakker,Gary D Strahan,Peter Csermely

doi:10.1371/journal.pone.0026683

Abstract

BackgroundAdvances in “omics” technologies have revolutionized the collection of biological data. A matching revolution in our understanding of biological systems, however, will only be realized when similar advances are made in informatic analysis of the resulting “big data.” Here, we compare the capabilities of three conventional and novel statistical approaches to summarize and decipher the tomato metabolome.MethodologyPrincipal component analysis (PCA), batch learning self-organizing maps (BL-SOM) and weighted gene co-expression network analysis (WGCNA) were applied to a multivariate NMR dataset collected from developmentally staged tomato fruits belonging to several genotypes. While PCA and BL-SOM are appropriate and commonly used methods, WGCNA holds several advantages in the analysis of highly multivariate, complex data.ConclusionsPCA separated the two major genetic backgrounds (AC and NC), but provided little further information. Both BL-SOM and WGCNA clustered metabolites by expression, but WGCNA additionally defined “modules” of co-expressed metabolites explicitly and provided additional network statistics that described the systems properties of the tomato metabolic network. Our first application of WGCNA to tomato metabolomics data identified three major modules of metabolites that were associated with ripening-related traits and genetic background.

Highlights

The technologies common to systems biology approaches – transcriptomics, proteomics, ionomics and metabolomics – are capable of generating data orders of magnitude more efficiently than was previously possible
Our first application of weighted gene coexpression network analysis (WGCNA) to tomato metabolomics data identified three major modules of metabolites that were associated with ripening-related traits and genetic background
Network analyses have been proposed as a solution to systems biology studies, those involving transcriptomic datasets, as this approach both models the interactions of real biological networks and is intuitively understood by users [4,5,6]

Summary

Introduction

The technologies common to systems biology approaches – transcriptomics, proteomics, ionomics and metabolomics (the ‘‘omics’’) – are capable of generating data orders of magnitude more efficiently than was previously possible. This increasingly economical flood of data is placing very significant limitations on the ability of scientists to store, process and analyze it [1,2]. The clustering of co-expressed molecules into "modules" mirrors regulatory associations found in biological systems and provides information on unknown nodes through "guilt by association" with well-characterized ones Such analyses can be focused on identifying properties associated with key molecules or can be applied in a non-targeted manner, where the networks themselves are the primary focus of interest. A matching revolution in our understanding of biological systems, will only be realized when similar advances are made in informatic analysis of the resulting ‘‘big data.’’ Here, we compare the capabilities of three conventional and novel statistical approaches to summarize and decipher the tomato metabolome

Methods

Results

Discussion

Conclusion