Abstract

Time series classification has been considered as one of the most challenging problems in data mining and is widely used in a broad range of fields. A biased distribution leads to classification on minority time series objects more severe. A commonly taken approach is to extract or select the representative features to retain the structure of a time series object. However, when the data distribution is imbalanced, the traditional features cannot represent time series effectively, especially in multi-class environment. In this paper, Shapelets — a primitive time series mining technology — is applied to extract the most representative subsequences. Especially, we verify that IG (Information Gain) is unsuitable as a shapelet quality measure for imbalanced data sets. Nevertheless, we propose two quality measures for shapelets on imbalanced binary and multi-class problem respectively. Based on extracted shapelet features, we select the diversified top-k shapelets based on new quality measure to represent the top-k best features and achieve this procedure on map-reduce framework. Lastly, two oversampling methods based on shapelet features are proposed to re-balance the binary and multi-class time series data sets. We validated our methods on the benchmark data sets by comparing with the canonical classifiers and the state-of-the-art time series algorithms. It is verified that the proposed algorithms perform more competitive than the compared methods in statistical significance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call