Deconstructing Cross-Entropy for Probabilistic Binary Classifiers.

Daniel Ramos,Joaquin Gonzalez-Rodriguez,Alicia Lozano-Diez,Javier Franco-Pedroso

doi:10.3390/e20030208

Daniel Ramos, Joaquin Gonzalez-Rodriguez + Show 2 more

Open Access

https://doi.org/10.3390/e20030208

Copy DOI

Journal: Entropy	Publication Date: Mar 20, 2018
Citations: 69	License type: CC BY 4.0

Affiliation: Autonomous University of Madrid

Abstract

In this work, we analyze the cross-entropy function, widely used in classifiers both as a performance measure and as an optimization objective. We contextualize cross-entropy in the light of Bayesian decision theory, the formal probabilistic framework for making decisions, and we thoroughly analyze its motivation, meaning and interpretation from an information-theoretical point of view. In this sense, this article presents several contributions: First, we explicitly analyze the contribution to cross-entropy of (i) prior knowledge; and (ii) the value of the features in the form of a likelihood ratio. Second, we introduce a decomposition of cross-entropy into two components: discrimination and calibration. This decomposition enables the measurement of different performance aspects of a classifier in a more precise way; and justifies previously reported strategies to obtain reliable probabilities by means of the calibration of the output of a discriminating classifier. Third, we give different information-theoretical interpretations of cross-entropy, which can be useful in different application scenarios, and which are related to the concept of reference probabilities. Fourth, we present an analysis tool, the Empirical Cross-Entropy (ECE) plot, a compact representation of cross-entropy and its aforementioned decomposition. We show the power of ECE plots, as compared to other classical performance representations, in two diverse experimental examples: a speaker verification system, and a forensic case where some glass findings are present.

Highlights

Probabilistic approaches for data mining, machine learning and pattern recognition have proven their effectiveness both theoretically and practically in multiple applications [1]
This article presents sound contributions for the general fields of pattern recognition, data mining and machine learning. One of these contributions is showing the, typically independent, sources of information affecting cross-entropy: prior knowledge and value of the features. These are not typically taken into account by previous approaches using cross-entropy in classifiers, such as in [3,5,6], where it is most common that the empirical prior probability is used solely
Other related measures such as Confusion Entropy (CEN) or Matthews Correlation Coefficient (MCC) [41,42] work with decision errors rather than probabilities, which implies the selection of a threshold τ, and they do not consider performance at different prior probabilities either

Summary

Introduction

Probabilistic approaches for data mining, machine learning and pattern recognition have proven their effectiveness both theoretically and practically in multiple applications [1]. Probabilistic outputs of systems have proven to be useful in many other research and application areas such as clinical decision support systems [9], cognitive psychology [10,11], biometric systems [12,13,14], weather forecasting [15] and forensic science [16,17] In all those areas, as well as it happens with classifiers in general, Bayesian decision theory [1] constitutes the formal framework to make optimal choices of courses of action. Apart from its advantages for general classifiers [13], this analysis is of particular interest in many applications such as forensic science [17,24], where prior probabilities and likelihood ratios are computed by different agents, with different responsibilities in the decision process. We generalize the use of an analysis tool, the Empirical Cross-Entropy (ECE) plot, previously used in forensic science [16,25] for two-class probabilistic classifiers.

Review of Bayesian Decision Theory

Optimal Bayesian Decisions

Empirical Performance of Probabilistic Classifiers

Calibration of Probabilities

Cross-Entropy

Proposed Measure of Accuracy

Choosing a Reference Probability Distribution for Intuitive Interpretation

Oracle Reference Po

PAV-Calibrated Reference Pcal

The ECE Plot

Speaker Verification

Forensic Case Involving Glass Findings

Discussion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deconstructing Cross-Entropy for Probabilistic Binary Classifiers.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Essays on Risk and Uncertainty: Comparing J. M. Keynes and the Von Mises Brothers, Richard and Ludwig, on Probability and Decision Theory
Michael Emmett Brady
SSRN | VOL. -
Michael Emmett BradyMichael Emmett Brady
20 Nov 2016
SSRN | VOL. -

ADCF Loss Function for Deep Metric Learning in End-to-End Text-Dependent Speaker Verification Systems
Victoria Mingote ... Dayana Ribas
IEEE/ACM transactions on audio, speech, and language processing | VOL. 30
Victoria Mingote, et. al.Victoria Mingote ... Dayana Ribas
01 Jan 2021
IEEE/ACM transactions on audio, speech, and language processing | VOL. 30

Muth’s Rational Expectations Hypothesis, That “…The Subjective Probability Distributions Are Distributed Around an Objective Probability Distribution, for a given Information Set…” is an Oxymoron. It Can Be Corrected by Amending It to Read “…The Subjective Probability Distributions Are Distributed Around a Subjective Consensus Probability Distribution, for a given Information Set…”
Michael Emmett Brady
SSRN | VOL. -
Michael Emmett BradyMichael Emmett Brady
29 May 2019
SSRN | VOL. -

Standard of proof in common law: Mathematical explication and probative value of statistical data
Valentyna I Borysova ... Bohdan P Karnaukh
Journal of the National Academy of Legal Sciences of Ukraine | VOL. 28
Valentyna I Borysova, et. al.Valentyna I Borysova ... Bohdan P Karnaukh
25 Jun 2021
Journal of the National Academy of Legal Sciences of Ukraine | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deconstructing Cross-Entropy for Probabilistic Binary Classifiers.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy