Abstract

Histograms have extensively been used as a simple tool for nonparametric probability density function estimation. However, practically, the accuracy of some histogram-based derived quantities, such as the marginal entropy (ME), the joint entropy (JE), or the mutual information (MI) depends on the number of bins chosen for the histogram. In this paper, we investigate the binning problem of bi-histogram for the estimation of JE. By minimizing a theoretical mean square error (MSE) of JE estimation, we derive a new formula for the optimal number of bins of bi-histogram for continuous random variables. This novel JE estimation has been used in the MI estimation to avoid the error accumulation of joint MI between the class variable and feature subset in the feature selection. In a synthetic Gaussian feature selection problem, only the proposed method permits to retrieve the exact number of relevant features that explain the class variable when compared to a concurrent univariate estimator based on binning formula that has been proposed for ME estimation. In speech and speaker recognition applications, the proposed method permits to select a limited number of features which guaranties approximately the same or an even better recognition rate than using the total number of features.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.