Simple strategies for semi-supervised feature selection

Konstantinos Sechidis,Gavin Brown

doi:10.1007/s10994-017-5648-2

Konstantinos Sechidis, Gavin Brown

Open Access

https://doi.org/10.1007/s10994-017-5648-2

Copy DOI

Journal: Machine Learning	Publication Date: Jul 17, 2017
Citations: 37	License type: open-access

Affiliation: University of Manchester

Abstract

What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data are all positives, or assume them all negatives. These minimalist, seemingly naive, approaches have not previously been studied in depth. However, with theoretical and empirical studies, we show they provide powerful results for feature selection, via hypothesis testing and feature ranking. Combining them with some “soft” prior knowledge of the domain, we derive two novel algorithms (Semi-JMI, Semi-IAMB) that outperform significantly more complex competing methods, showing particularly good performance when the labels are missing-not-at-random. We conclude that simple approaches to this problem can work surprisingly well, and in many situations we can provably recover the exact feature selection dynamics, as if we had labelled the entire dataset.

Highlights

Many real-world applications have limited access to labelled data, but abundant access to large amounts of unlabelled data
The first contribution is in terms of asking what happens to the false positive rate (FPR)/false negative rate (FNR) if we test with surrogate variables G(X ; Y0) or G(X ; Y1), instead of the ideal G(X ; Y )? In Sect. 3 we prove that the answer to this question is:
In this context, discovering the Markov Blanket (MB) can be useful for eliminating irrelevant features or features that are redundant in the context of others, and as a result it plays a fundamental role in filter feature selection

Summary

Introduction

Many real-world applications have limited access to labelled data, but abundant access to large amounts of unlabelled data. We tackle two semi-supervised scenarios—when the labels are missing completely at random (MCAR), and a missing-not-at-random scenario (MAR-C) where the class labels are missing according to a mechanism, dependent on the class label itself (Moreno-Torres et al 2012). The latter might occur for example when there is a social stigma associated with reporting of a label, such as income levels or HIV incidence. We use these properties to derive novel feature selection algorithms, which turn out to be highly competitive with significantly more complex procedures

Summary of results

Background

Feature selection by testing independence—Markov Blanket discovery

Phase II: backward — shrinkage

Feature selection by ranking—information theoretic methods

Semi-supervised learning

Motivating an inference-free approach and related work

Surrogate approaches for hypothesis testing

Conditional independence tests in semi-supervised learning

The switching procedure applied to Markov Blanket discovery—Semi-IAMB

Surrogate approaches for feature ranking

13 Step 4

Extending to higher order criteria

Application 1

MB discovery in positive-unlabelled learning

Incorporating “exact” prior knowledge in sample size determination

Evaluation of MB discovery in PU data

MB discovery in semi-supervised learning under class-prior-change

Comparing information theoretic feature selection approaches

Exploring the consistency of the selected subsets

Exploring the misclassification error

Comparison with state-of-the-art semi-supervised feature selection methods

Summary of contributions

Future work

A Tutorial on information theoretic testing and estimation

Theorem 1

Theorem 2

Theorem 3

Theorem 4

Theorem 6

Theorem 7

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Simple strategies for semi-supervised feature selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Similar Papers

Hessian-based semi-supervised feature selection using generalized uncorrelated constraint
Razieh Sheikhpour ... Saman Forouzandeh
Knowledge-Based Systems | VOL. 269
Razieh Sheikhpour, et. al.Razieh Sheikhpour ... Saman Forouzandeh
29 Mar 2023
Knowledge-Based Systems | VOL. 269

Maximally Informative Feature and Sensor Selection in Pattern Recognition Using Local and Global Independent Component Analysis
Tian Lan ... Deniz Erdogmus
The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology | VOL. 48
Tian Lan, et. al.Tian Lan ... Deniz Erdogmus
27 Mar 2007
The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology | VOL. 48

Discriminative Semi-Supervised Feature Selection Via Manifold Regularization
Zenglin Xu ... Michael Rung-Tsong Lyu
IEEE Transactions on Neural Networks | VOL. 21
Zenglin Xu, et. al. Zenglin Xu ... Michael Rung-Tsong Lyu
21 Jun 2010
IEEE Transactions on Neural Networks | VOL. 21

A Feature Ranking Strategy to Facilitate Multivariate Signal Classification
L Gupta ... R Vaidyanathan
IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) | VOL. 40
L Gupta, et. al.L Gupta ... R Vaidyanathan
01 Jan 2009
IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) | VOL. 40

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Simple strategies for semi-supervised feature selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning