SimMix: Local similarity-aware data augmentation for time series

Pin Liu,Yuxuan Guo,Pengpeng Chen,Zhijun Chen,Rui Wang,Yuzhu Wang,Bin Shi

doi:10.1016/j.eswa.2024.124793

Abstract

We find that local similarity is an essential factor for data augmentation in deep learning tasks concerning time series data, the applications of which are prevalent in various domains such as smart healthcare, intelligent transportation, smart finance, etc. With empirical and theoretical analysis, we find deep learning models achieve excellent performance only when the data augmentation method performs with appropriate intensity of local similarity—during the data augmentation process, too large/small intra-class local similarity will decrease the performance of deep learning models. With this discovery, we propose a time series augmentation method based on intra-class Similarity Mixing (SimMix), which accurately controls the intensity by quantifying and adjusting the similarity between augmented samples and original samples. With a PAC (i.e., Probably Approximately Correct) theoretical foundation, we design a cutmix strategy for non-equal length segments to eliminate semantic information loss and noise introduction defects in traditional methods. Through extensive validation on 10 real-world datasets, we demonstrate that the proposed method can outperform the state-of-the-art by a large margin.

Full Text