Abstract

In many applications, one is interested in determining which of the properties of a network module change across conditions. For example, to validate the existence of a module, it is desirable to show that it is reproducible (or preserved) in an independent test network. Here we study several types of network preservation statistics that do not require a module assignment in the test network. We distinguish network preservation statistics by the type of the underlying network. Some preservation statistics are defined for a general network (defined by an adjacency matrix) while others are only defined for a correlation network (constructed on the basis of pairwise correlations between numeric variables). Our applications show that the correlation structure facilitates the definition of particularly powerful module preservation statistics. We illustrate that evaluating module preservation is in general different from evaluating cluster preservation. We find that it is advantageous to aggregate multiple preservation statistics into summary preservation statistics. We illustrate the use of these methods in six gene co-expression network applications including 1) preservation of cholesterol biosynthesis pathway in mouse tissues, 2) comparison of human and chimpanzee brain networks, 3) preservation of selected KEGG pathways between human and chimpanzee brain networks, 4) sex differences in human cortical networks, 5) sex differences in mouse liver networks. While we find no evidence for sex specific modules in human cortical networks, we find that several human cortical modules are less preserved in chimpanzees. In particular, apoptosis genes are differentially co-expressed between humans and chimpanzees. Our simulation studies and applications show that module preservation statistics are useful for studying differences between the modular structure of networks. Data, R software and accompanying tutorials can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ModulePreservation.

Highlights

  • Network methods are frequently used in genomic and systems biologic studies, and in general data mining applications, to describe the pairwise relationships of a large number of variables [1,2]

  • A cross-tabulation based module preservation statistic would be meaningless when modules are defined as gene ontology categories since both reference and test networks contain the same sets of genes

  • A non-trivial question is whether the network connections of a module in the reference network resemble those of the same module in the test network

Read more

Summary

Introduction

Network methods are frequently used in genomic and systems biologic studies, and in general data mining applications, to describe the pairwise relationships of a large number of variables [1,2]. One is interested in studying the properties of network modules and their change across conditions [11,12,13,14,15,16]. This article describes several module preservation statistics for determining which properties of a network module are preserved in a second (test) network. The module preservation statistics allow one to quantify which aspects of within-module topology are preserved between a reference network and a test networks. We use the term ‘‘module’’ in a broad sense: a network module is a subset of nodes that forms a sub-network inside a larger network. Any subset of nodes inside a larger network can be considered a module. This subset may or may not correspond to a cluster of nodes

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call