Discovering causal relationships among symptoms is a topical issue in the analysis of observational patient datasets. A Causal Bayesian Network (CBN) is a popular analytical framework for causal inference. While there are many methods and algorithms capable of learning a Bayesian network, they are reliant on the complexity and thoroughness of the algorithm and do not consider prior expertise from authoritative sources. This article proposes a novel method of extracting prior causal knowledge contained in Authoritative Medical Ontologies (AMOs) and using this prior knowledge to orient arcs in a CBN learned from observational patient data. Since AMOs are robust biomedical ontologies containing the collective knowledge of the experts who created them, utilizing the ordering information contained within them produces improved CBNs that provide additional insight into the disease domain. To demonstrate our method, we obtained prior causal ordering information among symptoms from three AMOs: (1) the Medical Dictionary for Regulatory Activities Terminology (MedDRA), (2) the International Classification of Diseases Version 10 Clinical Modification (ICD-10-CM), and (3) Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). The prior ontological knowledge from these three AMOs is then used to orient arcs in a series of CBNs learned from the National Institutes of Mental Health study on Sequenced Treatment Alternatives to Relieve Depression (STAR*D) patient dataset using the Max-Min Hill-Climbing (MMHC) algorithm. Six distinct CBNs are generated using MMHC: an unmodified baseline model using only the algorithm, three CBNs oriented with ordered-variable pairs from MedDRA, ICD-10-CM, and SNOMED CT, and two more with ordered pairs from a combination of these AMOs. The resulting CBNs modified using ordered-variable pairs significantly change the structure of the network. The agreement between the Modified networks and the Baseline ranges from 50% to 90%. A modified network using ordering information from all ontologies obtained an agreement of 50% (10 out of 20 arcs exist in both the Baseline and Modified models) while maintaining comparable predictive accuracy. This indicates that the Modified CBN reflects the causal claims in the AMOs and agrees with both the AMOs and the observational STAR*D dataset. Furthermore, the Modified models discovered new potentially causal relationships among symptoms in the model, while eliminating weaker edges in a qualitative analysis of the significance of these relationships in existing epidemiological research.
Read full abstract