An Empirical Study of Several Information Theoretic Based Feature Extraction Methods for Classifying High Dimensional Low Sample Size Data

Sheena Leeza Verghese,Iman Yi Liao,Siang Yew Chong,Tomas H Maul

doi:10.1109/access.2021.3077958

Sheena Leeza Verghese, Iman Yi Liao + Show 2 more

Open Access

https://doi.org/10.1109/access.2021.3077958

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 28	License type: CC BY-NC-ND 4.0

Affiliation: University of Nottingham Malaysia Campus

Abstract

A high dimensional low sample size (HDLSS) dataset typically contains many features but a limited number of samples. It is commonly found in domains such as microarray data and medical imaging. When sample size is small, the population probability density function (PDF) of a HDLSS dataset may not be well represented, causing difficulties of applying feature selection or feature extraction methods for HDLSS data classification. In this paper, we explore the possibility of designing feature selection and feature extraction methods for HDLSS data classification by making loose assumption on the underlying PDF of a HDLSS dataset. Specifically, we propose to leverage on Correlation Explanation (CorEx), a recent unsupervised probabilistic graphical model that assumes (hierarchical) hidden structure for generating subsets of features that are conditionally independent. We benchmark the proposed method against frequently cited Information Theory based feature extraction and feature selection methods, including Conditional Infomax Feature Extraction (CIFE), Maximum Relevance Minimum Redundancy (MRMR), Maximization of Mutual Information (MMI), Infomax Independent Component Analysis (Infomax ICA),and Kernel Entropy Component Analysis (KECA). The HDLSS datasets used in this study are Breast Cancer Dataset by Gravier et. al and West et. al, Colon Cancer dataset by Alon et. al., Leukemia Dataset by Golub et.al and the Gisette Dataset used by Guyon et. al. Experimental results demonstrate that the proposed method shows some improvement in classification performance over MMI, and Infomax ICA and is competitive with MRMR and CIFE.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Empirical Study of Several Information Theoretic Based Feature Extraction Methods for Classifying High Dimensional Low Sample Size Data

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma
Heba Abusamra
Procedia Computer Science | VOL. 23
Heba AbusamraHeba Abusamra
01 Jan 2013
Procedia Computer Science | VOL. 23

An effective feature selection method based on pair-wise feature proximity for high dimensional low sample size data
S L Happy ... Aurobinda Routray
-
S L Happy, et. al.S L Happy ... Aurobinda Routray
01 Aug 2017
01 Aug 2017

A COMPARATIVE STUDY ON GENE SELECTION METHODS FOR TISSUES CLASSIFICATION ON LARGE SCALE GENE EXPRESSION DATA
Farzana Kabir Ahmad
Jurnal Teknologi | VOL. 78
Farzana Kabir AhmadFarzana Kabir Ahmad
30 May 2016
Jurnal Teknologi | VOL. 78

Improved microarray data analysis using feature selection methods with machine learning methods
Jing Sun ... Chakresh Kumar Jain
-
Jing Sun, et. al. Jing Sun ... Chakresh Kumar Jain
01 Dec 2016
01 Dec 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Empirical Study of Several Information Theoretic Based Feature Extraction Methods for Classifying High Dimensional Low Sample Size Data

Abstract

Talk to us

Similar Papers

More From: IEEE Access