Handling hybrid and missing data in constraint-based causal discovery to study the etiology of ADHD

Elena Sokolova,Tom Claassen,Daniel Von Rhein,Jilly Naaijen,Tom Heskes,Jan Buitelaar,Perry Groot

doi:10.1007/s41060-016-0034-x

Abstract

Causal discovery is an increasingly important method for data analysis in the field of medical research. In this paper, we consider two challenges in causal discovery that occur very often when working with medical data: a mixture of discrete and continuous variables and a substantial amount of missing values. To the best of our knowledge, there are no methods that can handle both challenges at the same time. In this paper, we develop a new method that can handle these challenges based on the assumption that data are missing at random and that continuous variables obey a non-paranormal distribution. We demonstrate the validity of our approach for causal discovery on simulated data as well as on two real-world data sets from a monetary incentive delay task and a reversal learning task. Our results help in the understanding of the etiology of attention-deficit/hyperactivity disorder (ADHD).

Highlights

In recent years, the use of causal discovery in the field of medical research has become increasingly popular
We considered the Waste Incinerator Network when the correlation between variables is extreme-high and medium
The simulation study shows that the expectation maximization (EM) algorithm performs better than Spearman with pairwise correlation, mean imputation, and list-wise deletion for directed graphical models when the percentage of missing values is high, while providing similar results when the percentage is low

Summary

Introduction

The use of causal discovery in the field of medical research has become increasingly popular. [1,34,53], the authors propose different methods to estimate the correlation matrix for data with missing values and mixture variables, and based on this correlation matrix learn the structure of the undirected graphical model. Even though the methods that are considered in this paper to estimate correlation matrices have similar performance for the undirected graphical model, our analysis suggests that these methods have a different effect on the accuracy of a causal discovery algorithm. The second data set studies how problems with learning from reinforcement are associated with ADHD symptoms using a probabilistic reversal learning task (PRL) Based on this data, we build two causal models that provide deeper understanding of the altered reward processing and reversal learning in adolescents with ADHD than standard statistical tests.

Background

Related study and motivation

Structure learning

Undirected graphical models

Proposed method

Simulation study

MID tasks study

Reversal task study

Discussion and conclusions

Compliance with ethical standards

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International journal of data science and analytics	Publication Date: Dec 2, 2016
Citations: 9	License type: open-access

R Discovery Prime

R Discovery Prime

Handling hybrid and missing data in constraint-based causal discovery to study the etiology of ADHD

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International journal of data science and analytics

Lead the way for us

Similar Papers

Causal Discovery from Medical Data: Dealing with Missing Values and a Mixture of Discrete and Continuous Data
Elena Sokolova ... Perry Groot
-
Elena Sokolova, et. al.Elena Sokolova ... Perry Groot
01 Jan 2015
01 Jan 2015

Author response: Applying causal discovery to single-cell analyses using CausalCell
Yujian Wen ... Hao Zhu
-
Yujian Wen, et. al.Yujian Wen ... Hao Zhu
23 Aug 2022
23 Aug 2022

Decision letter: Applying causal discovery to single-cell analyses using CausalCell
Babak Momeni ... Anna Akhmanova
-
Babak Momeni, et. al.Babak Momeni ... Anna Akhmanova
14 Aug 2022
14 Aug 2022

Causal Discovery from Databases with Discrete and Continuous Variables
Elena Sokolova ... Tom Claassen
-
Elena Sokolova, et. al.Elena Sokolova ... Tom Claassen
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Handling hybrid and missing data in constraint-based causal discovery to study the etiology of ADHD

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International journal of data science and analytics