Abstract

Clusters of genes in co-expression networks are commonly used as functional units for gene set enrichment detection and increasingly as features (attribute construction) for statistical inference and sample classification. One of the practical challenges of clustering for these purposes is to identify an optimal partition of the network where the individual clusters are neither too large, prohibiting interpretation, nor too small, precluding general inference. Newman Modularity is a spectral clustering algorithm that automatically finds the number of clusters, but for many biological networks the cluster sizes are suboptimal. In this work, we generalize Newman Modularity to incorporate information from indirect paths in RNA-Seq co-expression networks. We implement a merge-and-split algorithm that allows the user to constrain the range of cluster sizes: large enough to capture genes in relevant pathways, yet small enough to resolve distinct functions. We investigate the properties of our recursive indirect-pathways modularity (RIP-M) and compare it with other clustering methods using simulated co-expression networks and RNA-seq data from an influenza vaccine response study. RIP-M had higher cluster assignment accuracy than Newman Modularity for finding clusters in simulated co-expression networks for all scenarios, and RIP-M had comparable accuracy to Weighted Gene Correlation Network Analysis (WGCNA). RIP-M was more accurate than WGCNA for modest hard thresholds and comparable for high, while WGCNA was slightly more accurate for soft thresholds. In the vaccine study data, RIP-M and WGCNA enriched for a comparable number of immunologically relevant pathways.

Highlights

  • Modularity is a property of many complex systems where the components of the system are organized into functional subunits or modules

  • We compared the average accuracy of Newman Modularity, recursive indirect-pathways modularity (RIP-M) and Weighted Gene Correlation Network Analysis (WGCNA) to find the correct number of clusters and the correct cluster identities of all 400 simulated genes based on the Rand index

  • We specified a minimum cluster size of 10 and maximum of 50 for the RIP-M and WGCNA algorithm parameters. This range is more of a guideline than a strict rule because cluster sizes routinely go outside the range for both algorithms, and we find that the simulation results are not sensitive to this range

Read more

Summary

Introduction

Modularity is a property of many complex systems where the components of the system are organized into functional subunits or modules. Modular organization can be observed in engineered systems like hardware components of a computer, packages in software, or parts of a vehicle. Modularity is observed in evolved systems like DNA into chromosomes, spatial. It is commonly held that cellular organization and biochemical function is modular in nature (Hartwell et al, 1999; Mitra et al, 2013). There are likely multiple selective pressures that lead to modularity in evolved systems, one of which may be the frequency of a changing environment (Parter et al, 2007). If an environment is relatively static, an evolving system has the luxury to build large modules, whereas in a rapidly changing environment, there is a greater advantage to building and reusing smaller, robust functional subunits (modules)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call