SCMFTS: Scalable and Distributed Complexity Measures and Features for Univariate and Multivariate Time Series in Big Data Environments

Francisco J Baldán,José M Benítez,Daniel Peralta,Yvan Saeys

doi:10.1007/s44196-021-00036-7

Abstract

Time series data are becoming increasingly important due to the interconnectedness of the world. Classical problems, which are getting bigger and bigger, require more and more resources for their processing, and Big Data technologies offer many solutions. Although the principal algorithms for traditional vector-based problems are available in Big Data environments, the lack of tools for time series processing in these environments needs to be addressed. In this work, we propose a scalable and distributed time series transformation for Big Data environments based on well-known time series features (SCMFTS), which allows practitioners to apply traditional vector-based algorithms to time series problems. The proposed transformation, along with the algorithms available in Spark, improved the best results in the state-of-the-art on the Wearable Stress and Affect Detection dataset, which is the biggest publicly available multivariate time series dataset in the University of California Irvine (UCI) Machine Learning Repository. In addition, SCMFTS showed a linear relationship between its runtime and the number of processed time series, demonstrating a linear scalable behavior, which is mandatory in Big Data environments. SCMFTS has been implemented in the Scala programming language for the Apache Spark framework, and the code is publicly available.

Highlights

Nowadays, we can find devices generating data anywhere and at any time [2]
We propose a scalable and distributed time series transformation based on well-known time series features, named SCMFTS, to provide an alternative vectorbased representation of time series that enables the use of the traditional machine learning techniques available in Big Data environments
A high number of data points generates high runtimes, but if we compare runtimes for variables c_ACCx, c_ACCy, or c_ACCz with w_BVP, this does not happen. It is so because of the differences in the frequency value of these variables, which is included in the time series features calculation affecting the runtime. These phenomena are not related to the Spark implementation performed, but it depends on the structure of the input time series

Summary

Introduction

We can find devices generating data anywhere and at any time [2]. With the expansion of new technologies, the volume of data generated is growing by leaps and bounds. We propose a scalable and distributed time series transformation based on well-known time series features, named SCMFTS, to provide an alternative vectorbased representation of time series that enables the use of the traditional machine learning techniques available in Big Data environments. We have implemented it in Apache Spark through Scala, guaranteeing a fully scalable behavior, being the first proposal of this type made for Big Data environments. SCMFTS allows practitioners to face problems that would otherwise be impossible and to improve the results obtained through the additional information provided by the new time series features.

Time Series in Big Data

Big Data Frameworks

Scalable and Distributed Time Series Transformation Proposal

Transformed Data 5

Experimental Design

Datasets

Measures and Methodology

Models

Hardware and Software

Results

Performance Results on WESAD

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Computational Intelligence Systems	Publication Date: Nov 2, 2021
Citations: 3	License type: open-access

R Discovery Prime

R Discovery Prime

SCMFTS: Scalable and Distributed Complexity Measures and Features for Univariate and Multivariate Time Series in Big Data Environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Computational Intelligence Systems

Lead the way for us

Similar Papers

A PCA-based similarity measure for multivariate time series
Kiyoung Yang ... Cyrus Shahabi
-
Kiyoung Yang, et. al.Kiyoung Yang ... Cyrus Shahabi
13 Nov 2004
13 Nov 2004

On the Stationarity of Multivariate Time Series for Correlation-Based Data Analysis
Kiyoung Yang ... C. Shahabi
-
Kiyoung Yang, et. al. Kiyoung Yang ... C. Shahabi
27 Nov 2005
27 Nov 2005

The Disclosure of Social Responsibility Information of Coal Enterprises in Big Data Environment
Jing-Jing Li ... Ji Li
DEStech Transactions on Economics Business and Management | VOL. -
Jing-Jing Li, et. al.Jing-Jing Li ... Ji Li
03 Jul 2018
DEStech Transactions on Economics Business and Management | VOL. -

Differential Correlation Approach for Multivariate Time Series Feature Selection
Felix Pistorius ... Daniel Baumann
-
Felix Pistorius, et. al.Felix Pistorius ... Daniel Baumann
24 Oct 2021
24 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SCMFTS: Scalable and Distributed Complexity Measures and Features for Univariate and Multivariate Time Series in Big Data Environments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Computational Intelligence Systems