Abstract

As large amounts of heterogeneous biomedical data become available, numerous methods for integrating such datasets have been developed to extract complementary knowledge from multiple domains of sources. Recently, a deep learning approach has shown promising results in a variety of research areas. However, applying the deep learning approach requires expertise for constructing a deep architecture that can take multimodal longitudinal data. Thus, in this paper, a deep learning-based python package for data integration is developed. The python package deep learning-based multimodal longitudinal data integration framework (MildInt) provides the preconstructed deep learning architecture for a classification task. MildInt contains two learning phases: learning feature representation from each modality of data and training a classifier for the final decision. Adopting deep architecture in the first phase leads to learning more task-relevant feature representation than a linear model. In the second phase, linear regression classifier is used for detecting and investigating biomarkers from multimodal data. Thus, by combining the linear model and the deep learning model, higher accuracy and better interpretability can be achieved. We validated the performance of our package using simulation data and real data. For the real data, as a pilot study, we used clinical and multimodal neuroimaging datasets in Alzheimer’s disease to predict the disease progression. MildInt is capable of integrating multiple forms of numerical data including time series and non-time series data for extracting complementary features from the multimodal dataset.

Highlights

  • As the amount of biomedical datasets grows exponentially, the development of relevant data integration methods that can extract biological insight by incorporating heterogeneous data is required (Larranaga et al, 2006)

  • The classification performance of our package is compared with other well-known methods such as logistic regression (LR), random forest (RF), and support vector machine (SVM)

  • In the experiment with real data, four modalities of datasets, such as cognitive performance, cerebrospinal fluid (CSF), demographic data, and magnetic resonance imaging (MRI) data of patients in Alzheimer’s disease, are used for mild cognitive impairment (MCI) conversion prediction that is set to binary classification

Read more

Summary

Introduction

As the amount of biomedical datasets grows exponentially, the development of relevant data integration methods that can extract biological insight by incorporating heterogeneous data is required (Larranaga et al, 2006). In the field of translational research, deep learning-based predictive models have shown comparable results (Chaudhary et al, 2017; Choi et al, 2017; Lu et al, 2018; Lee et al, 2019). In previous studies, they integrated multiple domains of data using deep. In (Lee et al, 2019), multimodal gated recurrent unit (GRU) is used to integrate cognitive performance, cerebrospinal fluid (CSF), demographic data, and neuroimaging data to predict AD progression. Data integration is believed to help improve the classification performance by extracting complementary information from each domain of source

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call