Discovering Highly Informative Feature Set over High Dimensions

Chongsheng Zhang Chongsheng Zhang,F Masseglia,Xiangliang Zhang Xiangliang Zhang

doi:10.1109/ictai.2012.149

Chongsheng Zhang Chongsheng Zhang, F Masseglia + Show 1 more

Open Access

https://doi.org/10.1109/ictai.2012.149

Copy DOI

Abstract

For many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efficiency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efficiently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efficient.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Discovering Highly Informative Feature Set over High Dimensions

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Nov 1, 2012
Citations: 10	License type: other-oa

Similar Papers

A New Feature Sampling Method in Random Forests for Predicting High-Dimensional Data
Thanh-Tung Nguyen ... He Zhao
-
Thanh-Tung Nguyen, et. al.Thanh-Tung Nguyen ... He Zhao
01 Jan 2015
01 Jan 2015

Improved evaluation of existing methods in landscape analysis and comparison of black box optimization problems using regression models
Sobia Saleem
-
Sobia SaleemSobia Saleem
23 Apr 2021
23 Apr 2021

A new evolutionary algorithm for mining top-k discriminative patterns in high dimensional data
Tarcísio Lucas ... Teresa B Ludermir
Applied Soft Computing | VOL. 59
Tarcísio Lucas, et. al.Tarcísio Lucas ... Teresa B Ludermir
08 Jun 2017
Applied Soft Computing | VOL. 59

Branching and Circular Features in High Dimensional Data
Bei Wang ... M Vejdemo-Johansson
IEEE Transactions on Visualization and Computer Graphics | VOL. 17
Bei Wang, et. al. Bei Wang ... M Vejdemo-Johansson
01 Dec 2011
IEEE Transactions on Visualization and Computer Graphics | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discovering Highly Informative Feature Set over High Dimensions

Abstract

Talk to us

Similar Papers