Abstract
BackgroundStatistical interactions between disease-associated loci of complex genetic diseases suggest that genes from these regions are involved in a common mechanism impacting, or impacted by, the disease. The computational problem we address is to discover relationships among genes from these interacting regions that may explain the observed statistical interaction and the role of these genes in the disease phenotype.ResultsWe describe a heuristic algorithm for generating hypothetical gene relationships from loci associated with a complex disease phenotype. This approach, called Prioritizing Disease Genes by Analysis of Common Elements (PDG-ACE), mines biomedical keywords from text descriptions of genes and uses them to relate genes close to disease-associated loci. A keyword common to, and significantly over-represented in, a pair of gene descriptions may represent a preliminary hypothesis about the biological relationship between the genes, and suggest the role the genes play in the disease phenotype.ConclusionOur experimentation shows that the approach finds previously published relationships, while failing to find relationships that don't exist. The results also indicate that the approach is robust to differences in keyword vocabulary. We outline a brief case study in which results from a recently published Type 2 Diabetes association study are used to identify potential hypotheses.
Highlights
Statistical interactions between disease-associated loci of complex genetic diseases suggest that genes from these regions are involved in a common mechanism impacting, or impacted by, the disease
In the study of the genetics of complex diseases such as Bipolar Disorder, we see statistical interactions between disease-associated loci such as the interacting linkage peaks depicted in Figure 1, or interactions between pairs of SNPs in a genome-wide association study
Linkage peaks with statistical interaction suggest pairs of regions of the genome in which genes that co-contribute to a disease may be found
Summary
This paper describes our strategy and its implementation in a tool called PDG-ACE (Prioritizing Disease Genes by Analysis of Common Elements). All three genes refer to "marrow" (corrected p-value 0.033), consistent with bone disease, so the true genetic interaction may have been hidden in the previous study, but revealed by PDG-ACE In both the breast cancer and osteoporosis studies, evidence is consistent with gene family effects on the phenotype, as expected in complex diseases. These validation experiments show that findings from PDG-ACE are generally consistent with the strength of prior evidence, as seen by comparing p-values found in the interaction analyses and the pattern of significant keywords found by PDG-ACE. For the GSTM1-CYP2e1 locus pair at 1 KBP in the second breast cancer study, the common over-represented keywords for the MeSH vocabulary are: "cyp2e1", "ethanol", "smoke", "area", "stomach"', "toxicity", and "xenobiotics". GeneGo finds that all six entities fit into the GO process GO:0050794, and the input set is significantly over-represented in this process, with a p-value < 0.01
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.