Abstract

In the context of many data mining tasks, high dimensionality was shown to be able to pose significant problems, commonly referred to as different aspects of the curse of dimensionality. In this paper, we investigate in the time-series domain one aspect of the dimensionality curse called hubness, which refers to the tendency of some instances in a data set to become hubs by being included in unexpectedly many k-nearest neighbor lists of other instances. Through empirical measurements on a large collection of time-series data sets we demonstrate that the hubness phenomenon is caused by high intrinsic dimensionality of time-series data, and shed light on the mechanism through which hubs emerge, focusing on the popular and successful dynamic time warping (DTW) distance. Also, the interaction between hubness and the information provided by class labels is investigated, by considering label matches and mismatches between neighboring time series. Following our findings we formulate a framework for categorizing time-series data sets based on measurements that reflect hubness and the diversity of class labels among nearest neighbors. The framework allows one to assess whether hubness can be successfully used to improve the performance of k-NN classification. Finally, the merits of the framework are demonstrated through experimental evaluation of 1-NN and k-NN classifiers, including a proposed weighting scheme that is designed to make use of hubness information. Our experimental results show that the examined framework, in the majority of cases, is able to correctly reflect the circumstances in which hubness information can effectively be employed in k-NN time-series classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call