Abstract

ObjectiveTo develop a map of disease associations exclusively using two publicly available genetic sources: the catalog of single nucleotide polymorphisms (SNPs) from the HapMap, and the catalog of Genome Wide Association Studies (GWAS) from the NHGRI, and to evaluate it with a large, long-standing electronic medical record (EMR). MethodsA computational model, In Silico Bayesian Integration of GWAS (IsBIG), was developed to learn associations among diseases using a Bayesian network (BN) framework, using only genetic data. The IsBIG model (I-Model) was re-trained using data from our EMR (M-Model). Separately, another clinical model (C-Model) was learned from this training dataset. The I-Model was compared with both the M-Model and the C-Model for power to discriminate a disease given other diseases using a test dataset from our EMR. Area under receiver operator characteristics curve was used as a performance measure. Direct associations between diseases in the I-Model were also searched in the PubMed database and in classes of the Human Disease Network (HDN). ResultsOn the basis of genetic information alone, the I-Model linked a third of diseases from our EMR. When compared to the M-Model, the I-Model predicted diseases given other diseases with 94% specificity, 33% sensitivity, and 80% positive predictive value. The I-Model contained 117 direct associations between diseases. Of those associations, 20 (17%) were absent from the searches of the PubMed database; one of these was present in the C-Model. Of the direct associations in the I-Model, 7 (35%) were absent from disease classes of HDN. ConclusionUsing only publicly available genetic sources we have mapped associations in GWAS to a human disease map using an in silico approach. Furthermore, we have validated this disease map using phenotypic data from our EMR. Models predicting disease associations on the basis of known genetic associations alone are specific but not sensitive. Genetic data, as it currently exists, can only explain a fraction of the risk of a disease. Our approach makes a quantitative statement about disease variation that can be explained in an EMR on the basis of genetic associations described in the GWAS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call