Abstract

There are various definitions of mutual information. Essentially, these definitions can be divided into two classes: (1) definitions with random variables and (2) definitions with ensembles. However, there are some mathematical flaws in these definitions. For instance, Class 1 definitions either neglect the probability spaces or assume the two random variables have the same probability space. Class 2 definitions redefine marginal probabilities from the joint probabilities. In fact, the marginal probabilities are given from the ensembles and should not be redefined from the joint probabilities. Both Class 1 and Class 2 definitions assume a joint distribution exists. Yet, they all ignore an important fact that the joint or the joint probability measure is not unique. In this paper, we first present a new unified definition of mutual information to cover all the various definitions and to fix their mathematical flaws. Our idea is to define the joint distribution of two random variables by taking the marginal probabilities into consideration. Next, we establish some properties of the newly defined mutual information. We then propose a method to calculate mutual information in machine learning. Finally, we apply our newly defined mutual information to credit scoring.

Highlights

  • Mutual information has emerged in recent years as an important measure of statistical dependence

  • Our idea is to define the joint distribution of two random variables by taking the marginal probabilities into consideration

  • We propose a method to calculate mutual information in machine learning

Read more

Summary

Introduction

Mutual information has emerged in recent years as an important measure of statistical dependence. Shannon first introduced a concept called entropy for a single discrete chance variable He defined the joint entropy and conditional entropy for two discrete chance variables using the joint distribution. Class 1 definitions of mutual information depend on the joint distribution of two random variables. Pinsker ([6], 1960 and 1964) treated the fundamental concepts of Shannon in a more advanced manner by employing probability theory His definition of mutual information was more general in that he implicitly assumed the two random variables had different probability spaces. Class 2 definitions depend on the joint probability measure of the joint sample space of two ensembles Among such definitions, Fano ([9], 1961), Abramson ([10], 1963), and Gallager ([11], 1968) developed their definitions in a similar way. Throughout the paper, we will restrict our focus on mutual information for finite discrete random variables

Basic Concepts in Probability Theory
Shannon’s Original Definition
Class 1 Definitions
A New Unified Definition of Mutual Information
Newly Defined Mutual Information in Machine Learning
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call