Abstract

Traditional supervised time series classification (TSC) tasks assume that all training data are labeled. However, in practice, manually labelling all unlabeled data could be very time-consuming and often requires the participation of skilled domain experts. In this paper, we concern with the positive unlabeled time series classification problem (PUTSC), which refers to automatically labelling the large unlabeled set U based on a small positive labeled set PL. The self-training (ST) is the most widely used method for solving the PUTSC problem and has attracted increased attention due to its simplicity and effectiveness. The existing ST methods simply employ the one-nearest-neighbor (1NN) formula to determine which unlabeled time-series should be labeled. Nevertheless, we note that the 1NN formula might not be optimal for PUTSC tasks because it may be sensitive to the initial labeled data located near the boundary between the positive and negative classes. To overcome this issue, in this paper we propose an exploratory methodology called ST-average. Unlike conventional ST-based approaches, ST-average utilizes the average sequence calculated by DTW barycenter averaging technique to label the data. Compared with any individuals in PL set, the average sequence is more representative. Our proposal is insensitive to the initial labeled data and is more reliable than existing ST-based methods. Besides, we demonstrate that ST-average can naturally be implemented along with many existing techniques used in original ST. Experimental results on public datasets show that ST-average performs better than related popular methods.

Highlights

  • With the rapid development of the Internet of Things technology, a large number of time series generated by sensor devices have appeared in various fields, including PM2.5 sensing systems [1], activity tracking [2], real-time patient-specific ECG classification [3], and many more

  • The Time-series classification (TSC) tasks in real-life often involve positive unlabeled TSC (PUTSC) [5,6], which we study in this paper

  • Facing the aforementioned drawback, inspired by [15] which uses the centroid of time series to improve TSC efficiency and accuracy, in this paper, we propose an exploratory methodology called self-training based on the average sequence of the time-series (ST-average)

Read more

Summary

Introduction

With the rapid development of the Internet of Things technology, a large number of time series generated by sensor devices have appeared in various fields, including PM2.5 sensing systems [1], activity tracking [2], real-time patient-specific ECG classification [3], and many more. ST-average, is different from all the ST based works in that: ST-average labels the time-series in U which is the most similar to the average sequence of the PL set as a positive data. We point out that traditional ST-based methods may be sensitive to the initial labeled time-series located near the boundary between the positive and negative classes. To overcome this issue, we propose a novel method ST-average to solve the PUTSC problem by using the average sequence of the PL set to decide which unlabeled time-series should be labeled and added into PL set.

Positive Unlabeled Time Series Classification
Self-Training Technique for the PUTSC
Related Work for the Self-Training
Motivation
Dynamic Time Warping
Time-Series Averaging
Time Complexity Analysis
Algorithms
The Performance Metric
Datasets
Implementation Details
F1-Score
Running Time
Findings
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call