Estimation of mutual information by the fuzzy histogram

Maryam Amir Haeri,Mohammad Mehdi Ebadzadeh

doi:10.1007/s10700-014-9178-0

Abstract

Mutual Information (MI) is an important dependency measure between random variables, due to its tight connection with information theory. It has numerous applications, both in theory and practice. However, when employed in practice, it is often necessary to estimate the MI from available data. There are several methods to approximate the MI, but arguably one of the simplest and most widespread techniques is the histogram-based approach. This paper suggests the use of fuzzy partitioning for the histogram-based MI estimation. It uses a general form of fuzzy membership functions, which includes the class of crisp membership functions as a special case. It is accordingly shown that the average absolute error of the fuzzy-histogram method is less than that of the naive histogram method. Moreover, the accuracy of our technique is comparable, and in some cases superior to the accuracy of the Kernel density estimation (KDE) method, which is one of the best MI estimation methods. Furthermore, the computational cost of our technique is significantly less than that of the KDE. The new estimation method is investigated from different aspects, such as average error, bias and variance. Moreover, we explore the usefulness of the fuzzy-histogram MI estimator in a real-world bioinformatics application. Our experiments show that, in contrast to the naive histogram MI estimator, the fuzzy-histogram MI estimator is able to reveal all dependencies between the gene-expression data.

Full Text