Abstract

Random Matrix Theory (RMT) methods for threshold selection had only been applied in a very low number of studies aiming the construction of Gene Co-expression Networks (GCN) and several open questions remained, especially regarding the general applicability regardless the diverse data structure of gene expression data sets. Moreover, no clear methodology to follow at each step was available. Here, we show, that RMT methodology is, in fact, capable to differentiate Gaussian Orthogonal Ensemble (GOE) from Gaussian Diagonal Ensemble (GDE) structure for a great number of simulated data sets and that results are similar to those obtained with the reference method of clustering coefficient.

Highlights

  • The cell is a system of multiple interacting entities with specific functions, whose intrinsic complexity can be studied under mathematical frameworks such as networks

  • Gene co-expression networks (GCNs) are a common representation of this complex system as they depict those pair of genes having similar expression profiles, and highlight those genes that might be functionally related to the same pathway or protein complex

  • Random Matrix Theory (RMT) approaches for the threshold selection in complex networks are based on the characterization of the statistical distribution of the nearest neighbour spacing distribution (NNSD) of the eigenvalues of the adjacency matrix (CVETKOVIT et al, 1980; SARIKA et al, 2007) NNSD represents the probability of finding neighbour eigenvalues with any given spacing

Read more

Summary

Introduction

The cell is a system of multiple interacting entities with specific functions, whose intrinsic complexity can be studied under mathematical frameworks such as networks. RMT approaches for the threshold selection in complex networks are based on the characterization of the statistical distribution of the nearest neighbour spacing distribution (NNSD) of the eigenvalues of the adjacency matrix (CVETKOVIT et al, 1980; SARIKA et al, 2007) NNSD represents the probability of finding neighbour eigenvalues with any given spacing This probability is expected to have certain probabilities, depending on the correlation structure underlying the eigenvalues. We revisit the method and prove its applicability on GCNs. We use simulations based on RMT theory of several datasets, constructing random and systemic graphs in order to proof that the methodology can be applied to a wide range of gene expression data structures. Scripts have been written in R language (R CORE TEAM, 2017) and are available under request

Structure of adjacency matrices in RMT
RMT Methodology for threshold selection
Actual data
Reference method for threshold selection
Results and discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call