High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

Makoto Yamada,Wittawat Jitkrittum,Eric P Xing,Leonid Sigal,Masashi Sugiyama

doi:10.1162/neco_a_00537

Abstract

The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this letter, we consider a feature-wise kernelized Lasso for capturing nonlinear input-output dependency. We first show that with particular choices of kernel functions, nonredundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures such as the Hilbert-Schmidt independence criterion. We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to high-dimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments for classification and regression with thousands of features.

Highlights

Finding a subset of features in high-dimensional supervised learning is an important problem with many realworld applications such as gene selection from microarray data (Xing et al, 2001; Ding and Peng, 2005; Suzuki et al, 2009; Huang et al, 2010), document categorization (Forman, 2008), and prosthesis control (Shenoy et al, 2008).1.1 Problem DescriptionLet X (⊂ Rd) be the domain of input vector x and Y(⊂ R) be the domain of output data1 y
The least absolute shrinkage and selection operator (Lasso) (Tibshirani, 1996) allows computationally efficient feature selection based on the assumption of linear dependency between input features and output values
We use kernel regression (KR) (Scholkopf and Smola, 2002) with the Gaussian kernel for evaluating the mean squared error and the mean correlation when the top m = 10, 20, . . . , 50 features selected by each method are used

Summary

Introduction

Let X (⊂ Rd) be the domain of input vector x and Y(⊂ R) be the domain of output data y. Suppose we are given n independent and identically distributed (i.i.d.) paired samples,. We denote the original data by X = [x1, . The goal of supervised feature selection is to find m features (m < d) of input vector x that are responsible for predicting output y. The least absolute shrinkage and selection operator (Lasso) (Tibshirani, 1996) allows computationally efficient feature selection based on the assumption of linear dependency between input features and output values. The Lasso optimization problem is given as min α∈Rd

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Neural Computation	Publication Date: Oct 8, 2013
Citations: 301	License type: cc-by

R Discovery Prime

R Discovery Prime

High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neural Computation

Lead the way for us

Similar Papers

Multiple kernel learning using nonlinear lasso
Tinghua Wang ... Xiaoqiang Tu
IEEJ Transactions on Electrical and Electronic Engineering | VOL. 14
Tinghua Wang, et. al.Tinghua Wang ... Xiaoqiang Tu
20 Dec 2018
IEEJ Transactions on Electrical and Electronic Engineering | VOL. 14

A unified view of feature selection based on Hilbert-Schmidt independence criterion
Tinghua Wang ... Hanming Liu
Chemometrics and Intelligent Laboratory Systems | VOL. 236
Tinghua Wang, et. al.Tinghua Wang ... Hanming Liu
11 Mar 2023
Chemometrics and Intelligent Laboratory Systems | VOL. 236

Improved metabolomic data-based prediction of depressive symptoms using nonlinear machine learning with feature selection
Yuta Takahashi ... Gen Tamiya
Translational Psychiatry | VOL. 10
Yuta Takahashi, et. al.Yuta Takahashi ... Gen Tamiya
19 May 2020
Translational Psychiatry | VOL. 10

Decision letter: Applying causal discovery to single-cell analyses using CausalCell
Babak Momeni ... Anna Akhmanova
-
Babak Momeni, et. al.Babak Momeni ... Anna Akhmanova
14 Aug 2022
14 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neural Computation