Abstract

Genetic studies (in particular linkage and association studies) identify chromosomal regions involved in a disease or phenotype of interest, but those regions often contain many candidate genes, only a few of which can be followed-up for biological validation. Recently, computational methods to identify (prioritize) the most promising candidates within a region have been proposed, but they are usually not applicable to cases where little is known about the phenotype (no or few confirmed disease genes, fragmentary understanding of the biological cascades involved). We seek to overcome this limitation by replacing knowledge about the biological process by experimental data on differential gene expression between affected and healthy individuals. Considering the problem from the perspective of a gene/protein network, we assess a candidate gene by considering the level of differential expression in its neighborhood under the assumption that strong candidates will tend to be surrounded by differentially expressed neighbors. We define a notion of soft neighborhood where each gene is given a contributing weight, which decreases with the distance from the candidate gene on the protein network. To account for multiple paths between genes, we define the distance using the Laplacian exponential diffusion kernel. We score candidates by aggregating the differential expression of neighbors weighted as a function of distance. Through a randomization procedure, we rank candidates by p-values. We illustrate our approach on four monogenic diseases and successfully prioritize the known disease causing genes.

Highlights

  • Genetic studies, including cytogenetic, linkage, and association studies, can identify chromosomal regions associated with a disease or phenotype of interest

  • We seek to overcome this limitation by replacing knowledge about the biological process by experimental data on differential gene expression between affected and healthy individuals

  • Considering the problem from the perspective of a gene/protein network, we assess a candidate gene by considering the level of differential expression in its neighborhood under the assumption that strong candidates will tend to be surrounded by differentially expressed neighbors

Read more

Summary

Introduction

Genetic studies, including cytogenetic, linkage, and association studies, can identify chromosomal regions associated with a disease or phenotype of interest. High-throughput ‘omics’ experiments identify genes or proteins implicated in a biological process of interest In both cases, biologists are often confronted with long lists of tens or hundreds of candidate genes among which they need to select a limited number of candidates for further validation. Biologists are often confronted with long lists of tens or hundreds of candidate genes among which they need to select a limited number of candidates for further validation This problem has been termed gene prioritization [1]. Computational methods for prioritizing candidate genes have been proposed They usually rank candidates by matching their information across multiple data sources against a profile derived from a set of genes, pathways, or biological processes already known to be involved in the phenotype. Considering the problem from the perspective of a gene/protein network, we assess a candidate gene by considering the level of differential expression in its neighborhood under the assumption that strong candidates will tend to be surrounded by differentially expressed neighbors

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.