THRESHOLD SELECTION BASED ON RANDOM MATRIX THEORY FOR GENE CO-EXPRESSION NETWORK

Laura Baracaldo,Liliana Lopez-Kleine,Luis Leal

doi:10.28951/rbb.v36i2.205

Abstract

Random Matrix Theory (RMT) methods for threshold selection had only been applied in a very low number of studies aiming the construction of Gene Co-expression Networks (GCN) and several open questions remained, especially regarding the general applicability regardless the diverse data structure of gene expression data sets. Moreover, no clear methodology to follow at each step was available. Here, we show, that RMT methodology is, in fact, capable to differentiate Gaussian Orthogonal Ensemble (GOE) from Gaussian Diagonal Ensemble (GDE) structure for a great number of simulated data sets and that results are similar to those obtained with the reference method of clustering coefficient.

Highlights

The cell is a system of multiple interacting entities with specific functions, whose intrinsic complexity can be studied under mathematical frameworks such as networks
Gene co-expression networks (GCNs) are a common representation of this complex system as they depict those pair of genes having similar expression profiles, and highlight those genes that might be functionally related to the same pathway or protein complex
Random Matrix Theory (RMT) approaches for the threshold selection in complex networks are based on the characterization of the statistical distribution of the nearest neighbour spacing distribution (NNSD) of the eigenvalues of the adjacency matrix (CVETKOVIT et al, 1980; SARIKA et al, 2007) NNSD represents the probability of finding neighbour eigenvalues with any given spacing

Summary

Introduction

The cell is a system of multiple interacting entities with specific functions, whose intrinsic complexity can be studied under mathematical frameworks such as networks. RMT approaches for the threshold selection in complex networks are based on the characterization of the statistical distribution of the nearest neighbour spacing distribution (NNSD) of the eigenvalues of the adjacency matrix (CVETKOVIT et al, 1980; SARIKA et al, 2007) NNSD represents the probability of finding neighbour eigenvalues with any given spacing This probability is expected to have certain probabilities, depending on the correlation structure underlying the eigenvalues. We revisit the method and prove its applicability on GCNs. We use simulations based on RMT theory of several datasets, constructing random and systemic graphs in order to proof that the methodology can be applied to a wide range of gene expression data structures. Scripts have been written in R language (R CORE TEAM, 2017) and are available under request

Structure of adjacency matrices in RMT

RMT Methodology for threshold selection

Actual data

Reference method for threshold selection

Results and discussion