Abstract
BackgroundThe human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases.ResultsWe used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations.ConclusionThe validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome.
Highlights
The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases
The Disease Ontology is a disease-focused comprehensive subset of Unified Medical Language System (UMLS) and outside terms structured as a directed acyclic graph, similar to the structure of the Gene Ontology (GO) from the GO Consortium
Similar to the GO annotation, we provide a Disease Ontology (DO) annotation of the human genome; each annotation is supported by a peer-reviewed publication as required by GeneRIF
Summary
The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. High throughput genomics technologies generate a vast amount of data. Applying functional knowledge to genomic data is one method that has been used to reduce data complexity and establish biologically plausible arguments. These methods rely on a priori definition of gene sets, and the results necessarily depend on the strength of the annotations [1,2]. Few tools based on ontology are available for annotating genome-wide data with disease associations. The lack of ontology based disease annotation prevents the application of disease knowledge to genomic data, hindering the discovery of gene-disease associations from high throughput genomics technologies
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.