Subsumption reduces dataset dimensionality without decreasing performance of a machine learning classifier.

Donald C Wunsch,Daniel B Hier

doi:10.1109/embc46164.2021.9629897

Abstract

When features in a high dimension dataset are organized hierarchically, there is an inherent opportunity to reduce dimensionality. Since more specific concepts are subsumed by more general concepts, subsumption can be applied successively to reduce dimensionality. We tested whether sub-sumption could reduce the dimensionality of a disease dataset without impairing classification accuracy. We started with a dataset that had 168 neurological patients, 14 diagnoses, and 293 unique features. We applied subsumption repeatedly to create eight successively smaller datasets, ranging from 293 dimensions in the largest dataset to 11 dimensions in the smallest dataset. We tested a MLP classifier on all eight datasets. Precision, recall, accuracy, and validation declined only at the lowest dimensionality. Our preliminary results suggest that when features in a high dimension dataset are derived from a hierarchical ontology, subsumption is a viable strategy to reduce dimensionality.Clinical relevance- Datasets derived from electronic health records are often of high dimensionality. If features in the dataset are based on concepts from a hierarchical ontology, subsumption can reduce dimensionality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Subsumption reduces dataset dimensionality without decreasing performance of a machine learning classifier.

Abstract

Talk to us

Similar Papers

More From: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

Lead the way for us

Journal: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference	Publication Date: Nov 1, 2021
Citations: 2

Similar Papers

A Meta-Analysis Survey on the Usage of Meta-Heuristic Algorithms for Feature Selection on High-Dimensional Datasets
Li Yu Yab ... Rahayu A Hamid
IEEE Access | VOL. 10
Li Yu Yab, et. al.Li Yu Yab ... Rahayu A Hamid
01 Jan 2021
IEEE Access | VOL. 10

On Feature Selection and Rule Extraction for High Dimensional Data: A Case of Diffuse Large B-Cell Lymphomas Microarrays Classification
Narissara Eiamkanitchat ... Sansanee Auephanwiriyakul
Mathematical Problems in Engineering | VOL. 2015
Narissara Eiamkanitchat, et. al.Narissara Eiamkanitchat ... Sansanee Auephanwiriyakul
01 Jan 2015
Mathematical Problems in Engineering | VOL. 2015

A Grid-Based Scalable Classifier for High Dimensional Datasets
Sheetal Saini ... Sumeet Dua
-
Sheetal Saini, et. al.Sheetal Saini ... Sumeet Dua
01 Jan 2009
01 Jan 2009

Finding Feature Relationships and Relevant Features in Large Datasets using FPGAs
John C Porcello
-
John C PorcelloJohn C Porcello
04 Mar 2023
04 Mar 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Subsumption reduces dataset dimensionality without decreasing performance of a machine learning classifier.

Abstract

Talk to us

Similar Papers

More From: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference