Abstract

Data-based models built using machine learning solutions are becoming more prominent in the condition monitoring, maintenance, and prognostics fields. The capacity to build these models using a machine learning approach depends largely in the quality of the data. Of particular importance is the availability of labelled data, which describes the conditions that are intended to be identified. However, properly labelled data that is useful in many machine learning strategies is a scare resource. Furthermore, producing high-quality labelled data is expensive, time-consuming and a lot of times inaccurate given the uncertainty surrounding the labeling process and the annotators. 
 Active Learning (AL) has emerged as a semi-supervised approach that enables cost and time reductions of the labeling process. This approach has had a delayed adoption for time series classification given the difficulty to extract and present the time series information in such a way that it is easy to understand for the human annotator who incorporates the labels. This difficulty arises from the large dimensionality that many of these time series possess. This challenge is exacerbated by the cold-start problem, where the initial labelled dataset used in typical AL frameworks may not exist. Thus, the initial set of labels to be allocated to the time series samples is not available. This last challenge is particularly common on many condition monitoring applications where data samples of specific faults or problems does not exist.
 In this article, we present an AL framework to be used in the classification of time series from industrial process data, in particular vibration waveforms originated from condition monitoring applications. In this framework, we deal with the absence of labels to train an initial classification model by introducing a pre-clustering step. This step uses an unsupervised clustering algorithm to identify the number of labels and selects the points with a stronger group belonging as initial samples to be labelled in the active learning step. Furthermore, this framework presents two approaches to present the information to the annotator that can be via time-series imaging and automatic extraction of statistical features. Our work is motivated by the interest to facilitate the effort required for labeling time-series waveforms, while maintaining a high level of accuracy and consistency on those labels. In addition, we study the number of time-series samples that require to be labelled to achieve different levels of classification accuracy, as well as their confidence intervals. These experiments are carried out using vibration signals from a well-known rolling element bearing dataset and typical process data from a production plant. 
 An active learning framework that considers the conditions of the data commonly found in maintenance and condition monitoring applications while presenting the data in ways easy to interpret by human annotators can facilitate the generation reliable datasets. These datasets can, in turn, assist in the development of data-driven models that describe the many different processes that a machine undergoes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.