A Novel Calibration Step in Gene Co-Expression Network Construction.

Niloofar Aghaieabiane,Ioannis Koutis

doi:10.3389/fbinf.2021.704817

Niloofar Aghaieabiane, Ioannis Koutis

Open Access

https://doi.org/10.3389/fbinf.2021.704817

Copy DOI

Abstract

High-throughput technologies such as DNA microarrays and RNA-sequencing are used to measure the expression levels of large numbers of genes simultaneously. To support the extraction of biological knowledge, individual gene expression levels are transformed to Gene Co-expression Networks (GCNs). In a GCN, nodes correspond to genes, and the weight of the connection between two nodes is a measure of similarity in the expression behavior of the two genes. In general, GCN construction and analysis includes three steps; 1) calculating a similarity value for each pair of genes 2) using these similarity values to construct a fully connected weighted network 3) finding clusters of genes in the network, commonly called modules. The specific implementation of these three steps can significantly impact the final output and the downstream biological analysis. GCN construction is a well-studied topic. Existing algorithms rely on relatively simple statistical and mathematical tools to implement these steps. Currently, software package WGCNA appears to be the most widely accepted standard. We hypothesize that the raw features provided by sequencing data can be leveraged to extract modules of higher quality. A novel preprocessing step of the gene expression data set is introduced that in effect calibrates the expression levels of individual genes, before computing pairwise similarities. Further, the similarity is computed as an inner-product of positive vectors. In experiments, this provides a significant improvement over WGCNA, as measured by aggregate p-values of the gene ontology term enrichment of the computed modules.

Highlights

The availability of high-throughput technologies like DNA microarrays (Reshef et al, 2011) or RNAsequencing (Hrdlickova et al, 2017) (RNA-seq) has motivated several approaches for developing a computational understanding of genes and their functionalities
We perform two additional types of comparisons in order to demonstrate that the modules computed by the calibrationbased methods can be significantly different than those computed by WGCNA
As highlighted in the title of the original work (Zhang and Horvath, 2005), WGCNA is a versatile general framework that can be instantiated in multiple ways into concrete data-processing pipelines

Summary

Introduction

The availability of high-throughput technologies like DNA microarrays (Reshef et al, 2011) or RNAsequencing (Hrdlickova et al, 2017) (RNA-seq) has motivated several approaches for developing a computational understanding of genes and their functionalities. Framework Petal (Petereit et al, 2016) instantiates them as follows: 1) Similarity: Computation of the Spearman correlation, 2) Adjacency: Construction of an initial network using the signum function and further modification so that it follows certain scale-free and small-world criteria (Barabási and Albert, 1999). On the other hand, WeiGhted Correlation Network Analysis (WGCNA) which is the most widely acceptable framework for GCN construction takes the following steps: 1) Similarity: Computation of the Pearson correlation, 2) Adjacency: Conversion of the negative correlation values into positive, further powering the coefficients so that the resulting network follows the scale-free criteria and adding information about second-order neighborhoods of the network, in the form of what is called the Topological Overlap Measure (TOM) of the network (Zhang and Horvath, 2005; Langfelder and Horvath, 2008)

Objectives

Results

Conclusion