Improved mutual information measure for clustering, classification, and community detection.

M E J Newman,George T Cantwell,Jean-Gabriel Young

doi:10.1103/physreve.101.042304

M E J Newman, George T Cantwell + Show 1 more

Open Access

https://doi.org/10.1103/physreve.101.042304

Copy DOI

Journal: Physical Review E	Publication Date: Apr 23, 2020
Citations: 40	License type: publisher-specific, author manuscript

Affiliation: University of Michigan–Ann Arbor

Abstract

The information theoretic measure known as mutual information is widely used as a way to quantify the similarity of two different labelings or divisions of the same set of objects, such as arises, for instance, in clustering and classification problems in machine learning or community detection problems in network science. Here we argue that the standard mutual information, as commonly defined, omits a crucial term which can become large under real-world conditions, producing results that can be substantially in error. We derive an expression for this missing term and hence write a corrected mutual information that gives accurate results even in cases where the standard measure fails. We discuss practical implementation of the new measure and give example applications.

Full Text