Abstract
Gene expression data have been used to infer gene-gene networks (GGN) where an edge between two genes implies the conditional dependence of these two genes given all the other genes. Such gene-gene networks are of-ten referred to as gene regulatory networks since it may reveal expression regulation. Most of existing methods for identifying GGN employ penalized regression with L1(lasso), L2(ridge), or elastic net penalty, which spans the range of L1to L2penalty. However, for high dimensional gene expression data, a penalty that spans the range of L0and L1penalty, such as the log penalty, is often needed for variable selection consistency. Thus, we develop a novel method that em-ploys log penalty within the framework of an earlier network identification method space (Sparse PArtial Correlation Estimation), and implement it into a R package space-log. We show that the space-log is computationally efficient (source code implemented in C), and has good performance comparing with other methods, particularly for networks with hubs. Space-log is open source and available at GitHub,https://github.com/wuqian77/SpaceLog.
Highlights
The objective of this paper is to introduce a novel method that constructs gene-gene network (GGN) based on high dimensional gene expression data
We propose a new statistical method to estimate GGN by implementing the log penalty for the space approach, and we refer to our method as space-log
We evaluated the performance of the methods by the following metrics: number of false positives (FP), false negatives (FN), FP+FN, F1 score, FDR, true positive rate
Summary
The objective of this paper is to introduce a novel method that constructs gene-gene network (GGN) based on high dimensional gene expression data. Graphical Lasso improves on neighborhood selection by providing a maximum likelihood estimate of the partial correlation matrix. The space method exploits the symmetry of partial correlation matrix to improve the estimation accuracy. It avoids potential conflicts in neighborhood selection, that is, Yi is selected as a neighbor of Yj but Yj is not selected as a neighbor of Yi, and one has to make a post-hoc decision for whether Yi and Yj are connected. Penalties in the range of L to L is often needed to improve the accuracy of variable selection for high-dimensional gene expression data[4]. We propose a new statistical method to estimate GGN by implementing the log penalty for the space approach, and we refer to our method as space-log
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have