Efficient conformal regressors using bagged neural nets

Ulf Johansson,Henrik Linusson,Cecilia Sonstrod

doi:10.1109/ijcnn.2015.7280763

Ulf Johansson, Henrik Linusson + Show 1 more

Open Access

PDF Available

https://doi.org/10.1109/ijcnn.2015.7280763

Copy DOI

Export

Save

Cite

Publication Date: Jul 1, 2015

Citations: 2

Affiliation: University of Borås, Informa (Sweden)

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Conformal predictors use machine learning models to output prediction sets. For regression, a prediction set is simply a prediction interval. All conformal predictors are valid, meaning that the error rate on novel data is bounded by a preset significance level. The key performance metric for conformal predictors is their efficiency, i.e., the size of the prediction sets. Inductive conformal predictors utilize real-valued functions, called nonconformity functions, and a calibration set, i.e., a set of labeled instances not used for the model training, to obtain the prediction regions. In state-of-the-art conformal regressors, the nonconformity functions are normalized, i.e., they include a component estimating the difficulty of each instance. In this study, conformal regressors are built on top of ensembles of bagged neural networks, and several nonconformity functions are evaluated. In addition, the option to calibrate on out-of-bag instances instead of setting aside a calibration set is investigated. The experiments, using 33 publicly available data sets, show that normalized nonconformity functions can produce smaller prediction sets, but the efficiency is highly dependent on the quality of the difficulty estimation. Specifically, in this study, the most efficient normalized nonconformity function estimated the difficulty of an instance by calculating the average error of neighboring instances. These results are consistent with previous studies using random forests as underlying models. Calibrating on out-of-bag did, however, only lead to more efficient conformal predictors on smaller data sets, which is in sharp contrast to the random forest study, where out-out-of bag calibration was significantly better overall.

Full Text