Huge amount of data are nowadays produced by a large and disparate family of sensors, which typically measure multiple variables over time. Such rich information can be profitably organized as multivariate time-series. Collect enough labelled samples to set up supervised analysis for such kind of data is challenging while a reasonable assumption is to dispose of a limited background knowledge that can be injected in the analysis process. In this context, semi-supervised clustering methods represent a well suited tool to get the most out of such reduced amount of knowledge. With the aim to deal with multivariate time-series analysis under a limited background knowledge setting, we propose a semi-supervised (constrained) deep embedding time-series clustering framework that exploits knowledge supervision modeled as Must- and Cannot-link constraints. More in detail, our proposal, named conDetSEC (constrained Deep embedding time SEries Clustering), is based on Gated Recurrent Units (GRUs) with the aim to explicitly manage the temporal dimension associated to multi-variate time series data. conDetSEC implements a procedure in which an embedding generation step is combined with a clustering refinement step. Both steps exploit the small amount of available knowledge provided by Must- and Cannot-link constraints. More specifically, during the data embedding generation the constraints are used by jointly optimizing the network parameters via both unsupervised and semi-supervised tasks, while at the refinement step they are used in conjunction with the goal to stretch the embedding manifold towards the clustering centroids to recover a more clear cluster structure. Experimental evaluation on real-world benchmarks coming from diverse domains has highlighted the effectiveness of our proposal in comparison with state-of-the-art unsupervised and semi-supervised time-series clustering methods.
Read full abstract