Abstract
Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions—that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.
Highlights
Many diseases need complex genetic and environmental factors to occur
The first parameter w sets the relative importance of the difference in expression level and closeness in the protein interaction network
The optimum is reached for (w,g) = (0.005,39), which is well in the interior of the parameter space in both dimensions—0# w,0.01 and 0# g,0.01. This means that both the protein interaction network and the differential expression contain information that can be exploited in diseasegene ranking, as hypothesized
Summary
To find the genetic factors is important for both medical (aiding in drug discovery and personalized treatments) and scientific reasons (understanding mechanistic and evolutionary aspects of pathogenesis) Genetic approaches, such as linkage analysis (connecting loci with a tendency to be inherited together) and association studies (mapping correlation between alleles at different loci), have uncovered plenty of links between diseases and particular chromosomal regions [1]. Differences in expression levels are detected primarily by microarray studies [2,3,4,5,6] Another phenomenon pointed out by previous studies [7,8,9] is that genes associated with the same disorder tend to share common functional features, reflected in that their protein products have a tendency to interact with each other. This difference, as we will see, simplifies our method both conceptually and algorithmically, and makes it to a better tool for inferring pathogenic interactions invisible in microarray data
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.