An Introductory Approach to Time-Series Data Preparation and Analysis

Edward Baumann,Charles Hsu,Taylor Cox,Hayley Buba

doi:10.36001/phmconf.2023.v15i1.3561

Abstract

Machine learning (ML)/Artificial Intelligence (AI) has widespread applications and has revolutionized many industries due to advanced and matured sensor technology, as well as large-scale data collection efforts. One of the key tasks for effective ML/AI operations is the extraction and identification of useful and usable data to identify complex interrelationships and solve problems efficiently. The usefulness of the data is the value and meaning of the data within the desired model, while the usability of the data refers to the ease of use of data in a model. Complex supervised and unsupervised ML models, which used to be the domain of cutting-edge scientists and academics, can now be invoked as a basic function calls in public domain packages within Python, R, MATLAB, and other languages. While these functions require effective data preprocessing to overcome the unpredicted impacts of data quality in the real world (e.g. missing data, environmental noise, synchronizing at different sampling rates, etc.), their ease of use means they are often called with little to no understanding of the underlying math or ways to efficiently work through the data set. The approachability provided by the packages enables users to dive into complex problem sets with little advance preparation. However, in doing so there is a lack of understanding which will inevitably cause problems, skew results, or force the user to take a less efficient path to get to a similar answer. Each package provides relatively simple examples that deal with specific public data sets, yet not many provide the background knowledge and comprehensive methods required for building the inputs for extensive and effective time-series data modeling. Typically, the complex nature of time-series data requires an in-depth understanding of signals analysis and domain subject expertise to use in ML/AI predictive models. This paper will provide the reader an overview of the problems associated with time-series data modelling, propose a common set of preprocessing steps to follow, demonstrate a taxonomy classification for time series data, provide introductory reasoning regarding the underlying process, and discuss the models that would benefit from such a methodology. This is done here with the goal of equipping non-knowledge-domain experts with updated and approachable techniques to find which features to focus on while preprocessing for their time-series data preparation efforts.

Full Text