Abstract

It is important to identify disease-associated genes for studying the pathogenic mechanism of complex diseases. Recently, models for disease gene prediction are dominantly based on molecular expression data and networks, including gene expression, protein expression, co-expression networks, protein-protein interaction networks, etc. One limitation of these methods is that they do not consider the knowledge of annotated gene sets representing known pathways or functionally-related sets of genes. In this study, we propose a new approach to predict disease-associated genes by integrating annotated gene sets data from the Molecular Signature Database (MSigDB). It first represents and integrates the different types of annotated gene sets in the MSigDB database in the form of the signal matrix. It then uses the signal matrix as the gene feature to train the disease gene prediction model. We compare our method with existing methods in predicting genes for five complex diseases. The results show that our method is superior to other methods. Further, we perform a case study on autism spectrum disorder (ASD). We find that ASD predictions are associated with ASD based on the statistical analysis of biological networks and independent ASD studies. The source code, prediction results and datasets are publicly available on https://github.com/genemine/GSI.git.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call