A Bayesian Hierarchical Model for Comparing Average F1 Scores

Dell Zhang,Xiaoling Wang,Jun Wang,Xiaoxue Zhao

doi:10.1109/icdm.2015.44

A Bayesian Hierarchical Model for Comparing Average F1 Scores

Dell Zhang, Xiaoling Wang + Show 2 more

Open Access

https://doi.org/10.1109/icdm.2015.44

Copy DOI

Publication Date: Nov 1, 2015

Citations: 38

Affiliation: Birkbeck, University of London, East China Normal University, University College London

#Null Hypothesis Significance Testing #Multi-class Text Classification + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In multi-class text classification, the performance (effectiveness) of a classifier is usually measured by micro-averaged and macro-averaged F1 scores. However, the scores themselves do not tell us how reliable they are in terms of forecasting the classifier's future performance on unseen data. In this paper, we propose a novel approach to explicitly modelling the uncertainty of average F1 scores through Bayesian reasoning, and demonstrate that it can provide much more comprehensive performance comparison between text classifiers than the traditional frequentist null hypothesis significance testing (NHST).

Full Text