Abstract
Random Matrix Theory (RMT) methods for threshold selection had only been applied in a very low number of studies aiming the construction of Gene Co-expression Networks (GCN) and several open questions remained, especially regarding the general applicability regardless the diverse data structure of gene expression data sets. Moreover, no clear methodology to follow at each step was available. Here, we show, that RMT methodology is, in fact, capable to differentiate Gaussian Orthogonal Ensemble (GOE) from Gaussian Diagonal Ensemble (GDE) structure for a great number of simulated data sets and that results are similar to those obtained with the reference method of clustering coefficient.
Highlights
The cell is a system of multiple interacting entities with specific functions, whose intrinsic complexity can be studied under mathematical frameworks such as networks
Gene co-expression networks (GCNs) are a common representation of this complex system as they depict those pair of genes having similar expression profiles, and highlight those genes that might be functionally related to the same pathway or protein complex
Random Matrix Theory (RMT) approaches for the threshold selection in complex networks are based on the characterization of the statistical distribution of the nearest neighbour spacing distribution (NNSD) of the eigenvalues of the adjacency matrix (CVETKOVIT et al, 1980; SARIKA et al, 2007) NNSD represents the probability of finding neighbour eigenvalues with any given spacing
Summary
The cell is a system of multiple interacting entities with specific functions, whose intrinsic complexity can be studied under mathematical frameworks such as networks. RMT approaches for the threshold selection in complex networks are based on the characterization of the statistical distribution of the nearest neighbour spacing distribution (NNSD) of the eigenvalues of the adjacency matrix (CVETKOVIT et al, 1980; SARIKA et al, 2007) NNSD represents the probability of finding neighbour eigenvalues with any given spacing This probability is expected to have certain probabilities, depending on the correlation structure underlying the eigenvalues. We revisit the method and prove its applicability on GCNs. We use simulations based on RMT theory of several datasets, constructing random and systemic graphs in order to proof that the methodology can be applied to a wide range of gene expression data structures. Scripts have been written in R language (R CORE TEAM, 2017) and are available under request
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have