Abstract

The automatic discovery of causal relationships among human genes can shed light on gene regulatory processes and guide drug repositioning. To this end, a computationally-heavy method for causal discovery is distributed on a volunteer computing grid and, taking advantage of variable subsetting and stratification, proves to be useful for expanding local gene regulatory networks. The input data are purely observational measures of transcripts expression in human tissues and cell lines collected within the FANTOM project. The system relies on the BOINC platform and on optimized client code. The functional relevance of results, measured by analyzing the annotations of the identified interactions, increases significantly over the simple Pearson correlation between the transcripts. Additionally, in 82 percent of cases networks significantly overlap with known protein-protein interactions annotated in biological databases. In the two case studies presented, this approach has been used to expand the networks of genes associated with two severe human pathologies: prostate cancer and coronary artery disease. The method identified respectively 22 and 36 genes to be evaluated as novel targets for already approved drugs, demonstrating the effective applicability of the approach in pipelines aimed to drug repositioning.

Highlights

  • Bioinformaticians and computational biologists have been experiencing an increased need for computational resources in order to extract knowledge from the ever-growing amount of information of the -omics data produced by the latest highthroughput technologies

  • Coronary artery disease (CAD) GENES In order to quantify the overlap between the ranked list of genes genetically associated with CAD and the ranked expansion lists obtained in the previous step as output of NES2RA, we used the Weighted Jaccard Similarity (WJS) [37], which we define below

  • We presented the activity of systematic discovery of causal relationships between the transcripts of human genes and its application to prostate cancer and coronary artery disease with the goal of drug repositioning

Read more

Summary

INTRODUCTION

Bioinformaticians and computational biologists have been experiencing an increased need for computational resources in order to extract knowledge from the ever-growing amount of information of the -omics data produced by the latest highthroughput technologies. A. PERFORMANCE the OneGenE application running on the gene@home BOINC server, has the following main sets of parameters: I (number of iterations), D (the subset dimensions, i.e. the tile sizes), and A (the set of a to be used in the statistical test of the PC algorithm). PERFORMANCE the OneGenE application running on the gene@home BOINC server, has the following main sets of parameters: I (number of iterations), D (the subset dimensions, i.e. the tile sizes), and A (the set of a to be used in the statistical test of the PC algorithm) These parameters need to be carefully chosen by balancing the execution speed of the application, the accuracy of the results and the statistical errors, and their values depend on the expression dataset in input. This is a potential advantage with respect to use centralized computing resources

VALIDATION
CASE STUDY 1
PROSTATE CANCER GENES SELECTION AND EXPANSIONS WITH NES2RA
SELECTION OF TARGET GENES AND FUNCTIONAL ANALYSIS
SINGLE-GENE NES2RA EXPANSION OF THE TARGET GENES
COMPARISON OF EXPANDED LISTS WITH
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call