Abstract

Many vital real-world applications involve time-series data with skewed distribution. Compared to traditional imbalanced learning problems, the classification of imbalanced time-series data is more challenging due to the high dimensionality and high inter-variable correlation. This paper proposes a structure-preserving Oversampling method to resolve the High-dimensional Imbalanced Time-series classification (OHIT). OHIT leverages a density-ratio-based shared nearest neighbor clustering algorithm to capture the modes of minority class in high-dimensional space. It for each mode applies the shrinkage technique of large-dimensional covariance matrix to obtain an accurate and reliable covariance structure. The structure-preserving synthetic samples are eventually generated based on the multivariate Gaussian distribution with the estimated covariance matrix. In addition, to further promote the performance of classifying imbalanced time-series data, we integrate OHIT into boosting framework to obtain a new ensemble algorithm OHITBoost. Extensive experiments on several publicly available time-series datasets (including unimodal and multimodal) demonstrate the effectiveness of OHIT and OHITBoost in terms of F1, G-mean, and AUC.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call