Abstract
A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on very large datasets containing long time series or time series of different lengths. For many of the datasets studied, classification performance exceeded that of conventional instance-based classifiers, including one nearest neighbor classifiers using Euclidean distances and dynamic time warping and, most importantly, the features selected provide an understanding of the properties of the dataset, insight that can guide further scientific investigation.
Highlights
TIME series, measurements of some quantity taken over time, are measured and analyzed across the scientific disciplines, including human heart beats in medicine, cosmic rays in astrophysics, rates of inflation in economics, air temperatures in climate science, and sets of ordinary differential equations in mathematics
The problem of extracting useful information from time series has been treated in a variety of ways, including an analysis of the distribution, correlation structures, measures of entropy or complexity, stationarity estimates, fits to various linear and nonlinear time-series models, and quantities derived from the physical nonlinear time-series analysis literature
For these data sets of short patterns whose values through time can be used as the basis of computing a meaningful measure of distance between them, dynamic time warping (DTW) has been shown to set a high benchmark for classification performance [33]
Summary
TIME series, measurements of some quantity taken over time, are measured and analyzed across the scientific disciplines, including human heart beats in medicine, cosmic rays in astrophysics, rates of inflation in economics, air temperatures in climate science, and sets of ordinary differential equations in mathematics. The problem of extracting useful information from time series has been treated in a variety of ways, including an analysis of the distribution, correlation structures, measures of entropy or complexity, stationarity estimates, fits to various linear and nonlinear time-series models, and quantities derived from the physical nonlinear time-series analysis literature This broad range of scientific methods for understanding the properties and dynamics of time series has received less attention in the temporal data mining literature, which treats large databases of time series, typically with the aim of either clustering or classifying the data [1], [2], [3].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Knowledge and Data Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.