Imbalanced target prediction with pattern discovery on clinical data repositories

Tak-Ming Chan,Jie Jiang,Choo-Chiap Chiau,Yuxi Li,Yong Huo,Jane Zhu

doi:10.1186/s12911-017-0443-3

Tak-Ming Chan, Jie Jiang + Show 4 more

Open Access

PDF Available

https://doi.org/10.1186/s12911-017-0443-3

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundClinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling. However, most clinical studies require careful study design, dedicated data collection efforts, and sophisticated modeling techniques before a hypothesis can be tested. We aim to bridge this gap, so that clinical domain users can perform first-hand prediction on existing repository data without complicated handling, and obtain insightful patterns of imbalanced targets for a formal study before it is conducted. We specifically target for interpretability for domain users where the model can be conveniently explained and applied in clinical practice.MethodsWe propose an interpretable pattern model which is noise (missing) tolerant for practice data. To address the challenge of imbalanced targets of interest in clinical research, e.g., deaths less than a few percent, the geometric mean of sensitivity and specificity (G-mean) optimization criterion is employed, with which a simple but effective heuristic algorithm is developed.ResultsWe compared pattern discovery to clinically interpretable methods on two retrospective clinical datasets. They contain 14.9% deaths in 1 year in the thoracic dataset and 9.1% deaths in the cardiac dataset, respectively. In spite of the imbalance challenge shown on other methods, pattern discovery consistently shows competitive cross-validated prediction performance. Compared to logistic regression, Naïve Bayes, and decision tree, pattern discovery achieves statistically significant (p-values < 0.01, Wilcoxon signed rank test) favorable averaged testing G-means and F1-scores (harmonic mean of precision and sensitivity). Without requiring sophisticated technical processing of data and tweaking, the prediction performance of pattern discovery is consistently comparable to the best achievable performance.ConclusionsPattern discovery has demonstrated to be robust and valuable for target prediction on existing clinical data repositories with imbalance and noise. The prediction results and interpretable patterns can provide insights in an agile and inexpensive way for the potential formal studies.

Highlights

Clinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling
It is likely that logistic regression optimizes the loss function of accuracy which is dominated by the majority
Pattern discovery shows the advantage of robust and consistent prediction performance even without up-sampling, being the least sensitive to training up-sampling ratios. This is desirable for our target scenario where domain users would like to discover insights from noisy and imbalanced practice data before they further invest heavily into formal studies. For both F1-scores and geometric mean of sensitivity and specificity (G-mean) with the original prevalence, pattern discovery has already shown statistically significance (p-value = 4.43 × 10−5) over the random baseline which has fixed results for all different upsampling ratios, so we focus on detailed comparison with other methods for simplicity

Summary

Introduction

Clinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling. We aim to bridge this gap, so that clinical domain users can perform first-hand prediction on existing repository data without complicated handling, and obtain insightful patterns of imbalanced targets for a formal study before it is conducted. Traditional clinical studies, either retrospective or perspective, require tremendous efforts in design, data collection and sophisticated processing before any hypothesis can be tested or a target can be predicted. The dilemma is that with rich existing data, domain users desire to generate initial data-driven hypotheses and get insights whether a specific clinical target of interest is predictable and what attributes (predictor variables) should be considered, before they take on the more involving way of a formal clinical study. As imbalanced target prediction is more challenging, it is a realistically meaningful challenge which offers high practical value for outcome prediction and quality improvement in the real-world distribution of cases and control

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Medical Informatics and Decision Making	Publication Date: Apr 20, 2017
Citations: 15	License type: open-access

R Discovery Prime

Imbalanced target prediction with pattern discovery on clinical data repositories

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making

Lead the way for us

Similar Papers

A pattern-discovery-based outcome predictive tool integrated with clinical data repository: design and a case study on contrast related acute kidney injury
Yuxi Li ... Jie Jiang
BMC Medical Informatics and Decision Making | VOL. 22
Yuxi Li, et. al.Yuxi Li ... Jie Jiang
15 Apr 2022
BMC Medical Informatics and Decision Making | VOL. 22

5106A pattern-discovery-based outcome predictive tool integrated with clinical data repository: design and a case study on contrast related acute kidney injury
Y X Li ... J Jiang
European Heart Journal | VOL. 40
Y X Li, et. al.Y X Li ... J Jiang
01 Oct 2019
European Heart Journal | VOL. 40

A systematic review and meta-data analysis of clinical data repositories in Africa and beyond: recent development, challenges, and future directions
Kayode S Adewole ... Dahiru Jafaru Usman
Discover Data | VOL. 2
Kayode S Adewole, et. al.Kayode S Adewole ... Dahiru Jafaru Usman
26 Jun 2024
Discover Data | VOL. 2

Data mining and clinical data repositories: Insights from a 667,000 patient data set
Irene M Mullins ... William A Knaus
Computers in Biology and Medicine | VOL. 36
Irene M Mullins, et. al.Irene M Mullins ... William A Knaus
22 Dec 2005
Data mining and clinical data repositories: Insights from a 667,000 patient data set
Irene M Mullins ... William A Knaus

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Imbalanced target prediction with pattern discovery on clinical data repositories

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Medical Informatics and Decision Making