Abstract

Data imbalance is a practical and crucial issue in deep learning. Moreover, real-world datasets, such as electronic health records (EHR), often suffer from high missing rates. Both issues can be understood as noises in data that may lead to bad generalization results for standard deep-learning algorithms. This paper introduces a novel meta-learning approach to deal with these noise issues in an EHR dataset for a binary classification task. This meta-learning approach leverages the information from a selected subset of balanced, low-missing rate data to automatically assign proper weight to each sample. Such weights would enhance the informative samples and suppress the opposites during training. Furthermore, the meta-learning approach is model-agnostic for deep learning-based architectures that simultaneously handle the high imbalanced ratio and high missing rate problems. Through experiments, we demonstrate that this meta-learning approach is better in extreme cases. In the most extreme one, with an imbalance ratio of 172 and a 74.6% missing rate, our method outperforms the original model without meta-learning by as much as 10.3% of the area under the receiver-operating characteristic curve (AUROC) and 3.2% of the area under the precision-recall curve (AUPRC). Our results mark the first step towards training a robust model for extremely noisy EHR datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call