Abstract

We consider a nonparametric Generative Tree Model and discuss a problem of selecting active predictors for the response in such scenario. We investigated two popular information-based selection criteria: Conditional Infomax Feature Extraction (CIFE) and Joint Mutual information (JMI), which are both derived as approximations of Conditional Mutual Information (CMI) criterion. We show that both criteria CIFE and JMI may exhibit different behavior from CMI, resulting in different orders in which predictors are chosen in variable selection process. Explicit formulae for CMI and its two approximations in the generative tree model are obtained. As a byproduct, we establish expressions for an entropy of a multivariate gaussian mixture and its mutual information with mixing distribution.

Highlights

  • In the paper, we consider theoretical properties of Conditional Mutual Information (CMI) and its approximations in a certain dependence model called Generative Tree Model (GTM)

  • We will prove some results on information-theoretic properties of gaussian mixtures which are necessary to analyze the behavior of CMI, Conditional Infomax Feature Extraction (CIFE), and Joint Mutual information (JMI) in Generative

  • We define a special gaussian Generative Tree Model and investigate how greedy procedure based on (14), as well as its analogues when CMI is replaced by JMI and CIFE, behaves in this model

Read more

Summary

Introduction

We consider theoretical properties of Conditional Mutual Information (CMI) and its approximations in a certain dependence model called Generative Tree Model (GTM). CMI and its modifications are used in many problems of machine learning including feature selection, variable importance ranking, causal discovery, and structure learning of dependence networks (see, e.g., Reference [1,2]). We stress that our approach is intrinsically nonparametric and focuses on using nonparametric measures of conditional dependence for feature selection By studying their theoretical behavior for this task we learn an average behavior of their empirical counterparts for large sample sizes. Besides its explainable dependence structure, distributions of predictors in the considered model are mixed gaussians, and this facilitates calculation of explicit form of information-based selection criteria.

Preliminaries
Information-Theoretic Measures of Dependence
Information-Based Feature Selection
Approximations of CMI
Auxiliary Results
Main Results
Generative Tree Model
Behavior of CMI
Behavior of JMI
Behavior of CIFE and Its Comparison with JMI
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call