Abstract
Most real-world applications in EHRs involve temporal data with skewed distributions. The imbalanced classification problem becomes more difficult in sporadic temporal data that variables exist on correlation and have some missing values. A common solution to classification tasks with imbalanced data is the oversampling methods, which generate new samples to re-balancing the classes. However, traditional oversampling methods usually change the distribution, thereby leading to bias. This paper proposed a self-adaptive integrated oversampling method for imbalanced sporadic temporal data in EHRs. The masking vectors and density vectors have been introduced to measure missing value distribution of samples, and the minority samples are divided into high density samples and sparse density samples. We extend the resampling strategies combining a subsample alignment method and structure preserving oversampling method. The weight of sample difference is used to improve classification performance. Furthermore, the filter mechanism is proposed to remove the noise samples with good efficiency. The experimental results show that the proposed method increases performance compared to traditional resampling methods in terms of AUC, F1, and G-mean evaluation metrics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.