Enhancing Open-Set Recognition using Clustering-based Extreme Value Machine (C-EVM)

James Henrydoss,Manuel Gunther,Terrance E Boult,Steve Cruz,Chunchun Li

doi:10.1109/bigdata50022.2020.9378012

Abstract

In real-world deployments, machine learning applications find challenges when accessing ever-increasing volumes of data – the real world is open and often presents data from classes not seen in training. Open-set recognition is a growing area of machine learning addressing such problems. This research work advances the state-of-the-art in open-set recognition, the Extreme Value Machine (EVM), with a novel clustering-based extension (C-EVM) during training to improve the end-to-end prediction performance. The C-EVM combines Density-based spatial clustering of applications with noise (DBSCAN)-based clustering with a novel Nearby Clusters (NC) algorithm during model fitting to reduce computation while improving accuracy. Our experiments show a statistically significant improvement of 5-10% in macro F1-score over the state-of-the-art EVM on open-set testing using the KDD CUP-99 data set. Past work on open set recognition often traded improved open-set robustness for a decrease in closed-set accuracy, whereas C-EVM outperforms the EVM in both closed-set and open-set recognition. Testing on subsets of ImageNet-2012 with varying numbers of classes, the C-EVM statistically significantly out performs EVM when using deep features. A parameterless Hierarchical DBSCAN (HDBSCAN)-based C-EVM variant is introduced as part of this work that scales well for large data sets. Finally, both EVM and C-EVM can operate as kernel-free incremental learners, enabling these open-set multi-class classifiers to be useful for streaming and big data applications.

Full Text