Semi-Supervised Risk Control via Prediction-Powered Inference.
The risk-controlling prediction sets (RCPS) framework is a general tool for transforming the output of any machine learning model to design a predictive rule with rigorous error rate control. The key idea behind this framework is to use labeled hold-out calibration data to tune a hyper-parameter that affects the error rate of the resulting prediction rule. However, the limitation of such a calibration scheme is that with limited hold-out data, the tuned hyper-parameter becomes noisy and leads to a prediction rule with an error rate that is often unnecessarily conservative. To overcome this sample-size barrier, we introduce a semi-supervised calibration procedure that leverages unlabeled data to rigorously tune the hyper-parameter without compromising statistical validity. Our procedure builds upon the prediction-powered inference framework, carefully tailoring it to risk-controlling tasks. We demonstrate the benefits and validity of our proposal through two real-data experiments: few-shot image classification and early time series classification.
- Conference Article
29
- 10.1109/bibm.2012.6392654
- Oct 1, 2012
Early classification of time series has been receiving a lot of attention as of late, particularly in the context of gene expression. In the biomédical realm, early classification can be of tremendous help, by identifying the onset of a disease before it has time to fully take hold, or determining that a treatment has done its job and can be discontinued. In this paper we present a state-of-the-art model, which we call the Early Classification Model (ECM), that allows for early, accurate, and patient-specific classification of multivariate time series. The model is comprised of an integration of the widely-used HMM and SVM models, which, while not a new technique per se, has not been used for early classification of multivariate time series classification until now. It attained very promising results on the datasets we tested it on: in our experiments based on a published dataset of response to drug therapy in Multiple Sclerosis patients, ECM used only an average of 40% of a time series and was able to outperform some of the baseline models, which needed the full time series for classification.
- Research Article
91
- 10.1186/1471-2105-13-195
- Aug 8, 2012
- BMC Bioinformatics
BackgroundEarly classification of time series is beneficial for biomedical informatics problems such including, but not limited to, disease change detection. Early classification can be of tremendous help by identifying the onset of a disease before it has time to fully take hold. In addition, extracting patterns from the original time series helps domain experts to gain insights into the classification results. This problem has been studied recently using time series segments called shapelets. In this paper, we present a method, which we call Multivariate Shapelets Detection (MSD), that allows for early and patient-specific classification of multivariate time series. The method extracts time series patterns, called multivariate shapelets, from all dimensions of the time series that distinctly manifest the target class locally. The time series were classified by searching for the earliest closest patterns.ResultsThe proposed early classification method for multivariate time series has been evaluated on eight gene expression datasets from viral infection and drug response studies in humans. In our experiments, the MSD method outperformed the baseline methods, achieving highly accurate classification by using as little as 40%-64% of the time series. The obtained results provide evidence that using conventional classification methods on short time series is not as accurate as using the proposed methods specialized for early classification.ConclusionFor the early classification task, we proposed a method called Multivariate Shapelets Detection (MSD), which extracts patterns from all dimensions of the time series. We showed that the MSD method can classify the time series early by using as little as 40%-64% of the time series’ length.
- Research Article
13
- 10.1504/ijdmb.2015.067955
- Jan 1, 2015
- International journal of data mining and bioinformatics
Early classification of time series has been receiving a lot of attention recently. In this paper we present a model, which we call the Early Classification Model (ECM), that allows for early, accurate and patient-specific classification of multivariate observations. ECM is comprised of an integration of the widely used Hidden Markov Model (HMM) and Support Vector Machine (SVM) models. It attained very promising results on the datasets we tested it on: in one set of experiments based on a published dataset of response to drug therapy in Multiple Sclerosis patients, ECM used only an average of 40% of a time series and was able to outperform some of the baseline models, which needed the full time series for classification. In the set of experiments tested on a sepsis therapy dataset, ECM was able to surpass the standard threshold-based method and the state-of-the-art method for early classification of multivariate time series.
- Research Article
46
- 10.1007/s10618-020-00690-z
- Jun 16, 2020
- Data Mining and Knowledge Discovery
Early time series classification (eTSC) is the problem of classifying a time series after as few measurements as possible with the highest possible accuracy. The most critical issue of any eTSC method is to decide when enough data of a time series has been seen to take a decision: Waiting for more data points usually makes the classification problem easier but delays the time in which a classification is made; in contrast, earlier classification has to cope with less input data, often leading to inferior accuracy. The state-of-the-art eTSC methods compute a fixed optimal decision time assuming that every times series has the same defined start time (like turning on a machine). However, in many real-life applications measurements start at arbitrary times (like measuring heartbeats of a patient), implying that the best time for taking a decision varies widely between time series. We present TEASER, a novel algorithm that models eTSC as a two-tier classification problem: In the first tier, a classifier periodically assesses the incoming time series to compute class probabilities. However, these class probabilities are only used as output label if a second-tier classifier decides that the predicted label is reliable enough, which can happen after a different number of measurements. In an evaluation using 45 benchmark datasets, TEASER is two to three times earlier at predictions than its competitors while reaching the same or an even higher classification accuracy. We further show TEASER’s superior performance using real-life use cases, namely energy monitoring, and gait detection.
- Research Article
- 10.1145/3631531
- Dec 19, 2023
- ACM Transactions on Intelligent Systems and Technology
Time series data are ubiquitous in a variety of disciplines. Early classification of time series, which aims to predict the class label of a time series as early and accurately as possible, is a significant but challenging task in many time-sensitive applications. Existing approaches mainly utilize heuristic stopping rules to capture stopping signals from the prediction results of time series classifiers. However, heuristic stopping rules can only capture obvious stopping signals, which makes these approaches give either correct but late predictions or early but incorrect predictions. To tackle the problem, we propose a novel second-order confidence network for early classification of time series, which can automatically learn to capture implicit stopping signals in early time series in a unified framework. The proposed model leverages deep neural models to capture temporal patterns and outputs second-order confidence to reflect the implicit stopping signals. Specifically, our model exploits the data not only from a time step but also from the probability sequence to capture stopping signals. By combining stopping signals from the classifier output and the second-order confidence, we design a more robust trigger to decide whether or not to request more observations from future time steps. Experimental results show that our approach can achieve superior results in early classification compared to state-of-the-art approaches.
- Research Article
73
- 10.1109/tai.2020.3027279
- Aug 1, 2020
- IEEE Transactions on Artificial Intelligence
Early classification of time series has been extensively studied for minimizing class prediction delay in time-sensitive applications such as healthcare and finance. A primary task of an early classification approach is to classify an incomplete time series as soon as possible with some desired level of accuracy. Recent years have witnessed several approaches for early classification of time series. As most of the approaches have solved the early classification problem with different aspects, it becomes very important to make a thorough review of the existing solutions to know the current status of the area. These solutions have demonstrated reasonable performance in a wide range of applications including human activity recognition, gene expression based health diagnostic, industrial monitoring, and so on. In this paper, we present a systematic review of current literature on early classification approaches for both univariate and multivariate time series. We divide various existing approaches into four exclusive categories based on their proposed solution strategies. The four categories include prefix based, shapelet based, model based, and miscellaneous approaches. The authors also discuss the applications of early classification in many areas including industrial monitoring, intelligent transportation, and medical. Finally, we provide a quick summary of the current literature with future research directions.
- Conference Article
2
- 10.1109/ijcnn55064.2022.9892391
- Jul 18, 2022
Many approaches have been proposed for early classification of time series in light of its significance in a wide range of applications including healthcare, transportation and finance.Until now, the early classification problem has been dealt with by considering only irrevocable decisions.This paper introduces a new problem called early and revocable time series classification, where the decision maker can revoke its earlier decisions based on the new available measurements.In order to formalize and tackle this problem, we propose a new cost-based framework and derive two new approaches from it.The first approach does not consider explicitly the cost of changing decision, while the second one does.Extensive experiments are conducted to evaluate these approaches on a large benchmark of real datasets.The empirical results obtained convincingly show (i) that the ability of revoking decisions significantly improves performance over the irrevocable regime, and (ii) that taking into account the cost of changing decision brings even better results in general.
- Research Article
6
- 10.1007/s10618-021-00781-5
- Aug 16, 2021
- Data Mining and Knowledge Discovery
Early time series classification (EarlyTSC) involves the prediction of a class label based on partial observation of a given time series. Most EarlyTSC algorithms consider the trade-off between accuracy and earliness as two competing objectives, using a single dedicated hyperparameter. To obtain insights into this trade-off requires finding a set of non-dominated (Pareto efficient) classifiers. So far, this has been approached through manual hyperparameter tuning. Since the trade-off hyperparameters only provide indirect control over the earliness-accuracy trade-off, manual tuning is tedious and tends to result in many sub-optimal hyperparameter settings. This complicates the search for optimal hyperparameter settings and forms a hurdle for the application of EarlyTSC to real-world problems. To address these issues, we propose an automated approach to hyperparameter tuning and algorithm selection for EarlyTSC, building on developments in the fast-moving research area known as automated machine learning (AutoML). To deal with the challenging task of optimising two conflicting objectives in early time series classification, we propose MultiETSC, a system for multi-objective algorithm selection and hyperparameter optimisation (MO-CASH) for EarlyTSC. MultiETSC can potentially leverage any existing or future EarlyTSC algorithm and produces a set of Pareto optimal algorithm configurations from which a user can choose a posteriori. As an additional benefit, our proposed framework can incorporate and leverage time-series classification algorithms not originally designed for EarlyTSC for improving performance on EarlyTSC; we demonstrate this property using a newly defined, “naïve” fixed-time algorithm. In an extensive empirical evaluation of our new approach on a benchmark of 115 data sets, we show that MultiETSC performs substantially better than baseline methods, ranking highest (avg. rank 1.98) compared to conceptually simpler single-algorithm (2.98) and single-objective alternatives (4.36).
- Research Article
7
- 10.1016/j.ipm.2023.103465
- Jul 24, 2023
- Information Processing & Management
Early time series classification is a variant of the time series classification task, in which a label must be assigned to the incoming time series as quickly as possible without necessarily screening through the whole sequence. It needs to be realized on the algorithmic level by fusing a decision-making method that detects the right moment to stop and a classifier that assigns a class label. The contribution addressed in this paper is twofold. Firstly, we present a new method for finding the best moment to perform an action (terminate/continue). Secondly, we propose a new learning scheme using classifier calibration to estimate classification accuracy. The new approach, called CALIMERA, is formalized as a cost minimization problem. Using two benchmark methodologies for early time series classification, we have shown that the proposed model achieves better results than the current state-of-the-art. Two most serious competitors of CALIMERA are ECONOMY and TEASER. The empirical comparison showed that the new method achieved a higher accuracy than TEASER for 35 out of 45 datasets and it outperformed ECONOMY in 20 out of 34 datasets.
- Research Article
30
- 10.1109/access.2019.2929644
- Jan 1, 2019
- IEEE Access
Early classification of time series aims to predict the class value of a sequence accurately as early as possible, not wait for the full-length data, which is significant in many time-sensitive applications and has attracted great interest in recent years. For instance, early diagnosis can help patients get early treatment and even save their lives. The problem of early classification is how to determine whether the collected data are sufficient to output the class value. Moreover, in practical applications, users also need to know the confidence (reliability) of the prediction results for more appropriate processing. For example, giving a healthy patient the possibility of suffering from some disease can assist physicians in an optimal therapy. However, existing work has not provided an effective measure to indicate how accurate the classification is. Therefore, in this paper, we propose an effective confidence-based early classification of time series. Firstly, based on a set of base time series classifiers trained at different timestamps, we propose a dynamic decision fusion method to measure the confidence of a predicted result by fusing the results of multiple base classifiers. Secondly, by analyzing the distribution of confidence values, we develop an adaptive learning method for the confidence threshold to simultaneously optimize the two conflicting objectives: accuracy and earliness. Finally, the experimental results conducted on 45 equal-length datasets and 8 variable-length datasets clearly show that our proposed approach can achieve the superior in early classification compared to state-of-the-art approaches.
- Conference Article
6
- 10.1109/iccasit55263.2022.9986835
- Oct 12, 2022
Early time series classification is of great significance for time-sensitive applications such as fault detection and earthquake prediction. This task aims to classify time series with the least timestamps at desired accuracy. Recent deep learning methods usually used the Recurrent Neural Networks (RNNs) as the classification backbone and the exiting subnet for early quitting. However, the RNNs suffer from the 'forgetting' defect and insufficient local feature extraction. Besides, the balance between earliness and accuracy is not fully considered. In this paper, a framework named TCN-Transformer is proposed. To overcome the defects of RNNs, we combined Temporal Convolutional Network and Transformer to extract both local and global features. Then, a loss function is designed to ensure the classification performance, while focusing more on earlier features. The experimental results on ten univariate datasets.
- Conference Article
1
- 10.1109/iccr55715.2022.10053847
- Dec 2, 2022
Early time series classification (ETSC) is of great significance for time-sensitive applications such as disaster prediction and gas leak detection. This task aims to classify time series with the least timestamps at desired accuracy. Recently, deep learning methods in ETSC usually used Convolutional Neural Networks to extract local features from fixed-length sequences for classification, and then set a threshold according to extensive expert experience for early exiting. However, the vanilla convolution operations cannot adapt to the data characteristics effectively. Moreover, the length variability of samples is also underestimated. To handle these problems, a dynamic convolution strategy is proposed to generate data adaptive convolution kernels for different samples. Moreover, we use random truncation based data augmentation techniques to enhance the convolution kernels in adapting to the data length variability. Experimental results on eight univariate datasets demonstrate the promising superiority of the proposed method.
- Conference Article
1
- 10.1145/3478905.3478971
- Jul 23, 2021
The purpose of early classification of time series is to predict the class label of time series in advance when time series has not been collected completely, which is meaningful in financial fields with high timeliness requirements. Current financial analysis techniques, such as methods based on the Support Vector Machine and Naive Bayes, need to analyze complete data to get results, which may delay managers to supervise the market. Therefore, we propose decision-making method of futures trading using dictionary-based early classification of time series. Specifically, we train a group of basic classifiers under different timestamps. The classifier extract subsequences along the sliding window to construct the bag-of-pattern, and then use the logistic regression model for classification. In addition, considering that the main task of early classification of time series is to determine the earliest time of reliable classification. Thus, based on the idea of dynamic decision fusion, we combine the number of classifiers, prediction results of different classifiers, and the conflict function value between earliness and accuracy of results and select the best number of classifiers and the threshold of reliability, which determine the time of reliable output. Consequently, we obtain an algorithm for finding the earliest time of reliable classification. Experimental results on different futures datasets show that, compared with the current popular financial analysis technology, in the aspect of earliness, we use early classification of time series to classify the futures data only by seeing about 60% of length of the complete futures data, which helps the manager of financial regulatory authorities to start making decisions about 40% earlier, leaving more time for judging and guiding decisions. In terms of accuracy, our method has achieved better performance.
- Research Article
8
- 10.1016/j.neucom.2022.02.044
- Feb 24, 2022
- Neurocomputing
Few-shot image classification with composite rotation based self-supervised auxiliary task
- Research Article
7
- 10.1109/tkde.2021.3108580
- Jan 1, 2021
- IEEE Transactions on Knowledge and Data Engineering
Since its introduction two decades ago, there has been increasing interest in the problem of early classification of time series. This problem generalizes classic time series classification to ask if we can classify a time series subsequence with sufficient accuracy and confidence after seeing only some prefix of a target pattern. The idea is that the earlier classification would allow us to take immediate action, in a domain in which some practical interventions are possible. For example, that intervention might be sounding an alarm or applying the brakes in an automobile. In this work, we make a surprising claim. In spite of the fact that there are dozens of papers on early classification of time series, it is not clear that any of them could ever work in a real-world setting. The problem is not with the algorithms per se but with the vague and underspecified problem description. Essentially all algorithms make implicit and unwarranted assumptions about the problem that will ensure that they will be plagued by false positives and false negatives even if their results suggested that they could obtain near-perfect results. We will explain our findings with novel insights and experiments and offer recommendations to the community.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.