AD Biological Domains—an evolving computational resource for big data classification in AD from the TREAT‐AD Consortium

Jesse C Wiley,Sruthi Ganesh,Allan I Levey,Anna K Greenwood,Gregory W Carter,Gregory A Cary

doi:10.1002/alz.082181

Abstract

AbstractBackgroundAlzheimer’s disease (AD) is a highly heterogenous disease with a multitude of subtypes and interacting biological systems emerging in a complex pattern of pathology, neurodegeneration and cognitive decline. In order to help organize large scale data analysis in AD in an unbiased manner, we developed a computational model of endophenotypes associated with AD. We call these models the AD Biological Domains.MethodThe Common Alzheimer’s Disease Research Ontology (CADRO) was leveraged in association with literature mining to identify 19 parent endophenotypes that are largely sufficient to cover AD related pathology. These endophenotypes are instantiated as large sets of manually curated gene ontology (GO) terms and linked genes. This framework enables researchers to unambiguously move from an identified gene to a larger disease‐associated biological phenotype.ResultWe defined 19 biological domains in terms of constitutive sets of GO terms. A gene‐set enrichment analysis has confirmed that these 19 biological domains cover almost all AD‐enriched GO terms, barring those that fall outside the reach of the biological domains due to lack of specificity in biological implication. The size of the biological domains ranges from over a thousand GO terms for ‘Proteostasis’ and ‘Synapse’, and just over a dozen for the terms ‘APP Metabolism’ and ‘Tau homeostasis’. There is a low level of cross‐domain utilization of common GO terms, as the biological domains are effectively siloed in their definition, however, at the level of gene annotation we see high levels of promiscuity across domains, potentially providing a molecular trail to find interdependencies between domains. These biological domains are currently being used across several NIA‐funded consortia, including TREAT‐AD, AMP‐AD, MODEL‐AD, and have been adopted by partnered resources such as the AD‐Atlas and Agora. The full list of biological domains and associated GO terms and genes are openly accessible through the AD Knowledge Portal.ConclusionThe biological domains are an open resource designed to provide a community driven definition of discrete areas of AD biology, allowing AD investigators to utilize a standard set of computationally accessible resources to define emerging transcriptomic and proteomic datasets in a unified manner, making cross study comparisons more informative.

Full Text