Abstract
Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the “large p, small n” problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method.
Highlights
The rapid development of high-throughput technologies has produced extensive gene expression data, and mining useful cell function information from these data has become a crucial goal in systems biology [1,2]
Inspired by the network model based on information theory and a feature selection method known as maximum relevance-maximum significance (MRMS) [37], we propose a novel information—theoretic network inference method based on the MRMS, in which the problem of network recovery is converted into a process whereby the regulator genes for each target gene are selected
We found that maximum-relevance and maximum-significance network (MRMSn), CLR, ARACNE and MIDER can infer the same network topology with the true network, redundant edges are produced in MRNET and MI3
Summary
The rapid development of high-throughput technologies has produced extensive gene expression data, and mining useful cell function information from these data has become a crucial goal in systems biology [1,2]. Specific physiological activity in cells occurs at the gene expression level. This physiological activity results from the interaction of a large number of genes and biological molecules and is not controlled by the gene itself. The sophisticated regulatory relationships between genes are often depicted in the form of gene regulatory networks. Gene network inferences are crucial to identifying regulatory relationships and understanding regulatory mechanisms [3,4].
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have