LICIC: Less Important Components for Imbalanced Multiclass Classification

Vincenzo Dentamaro,Giuseppe Pirlo,Donato Impedovo

doi:10.3390/info9120317

Abstract

Multiclass classification in cancer diagnostics, using DNA or Gene Expression Signatures, but also classification of bacteria species fingerprints in MALDI-TOF mass spectrometry data, is challenging because of imbalanced data and the high number of dimensions with respect to the number of instances. In this study, a new oversampling technique called LICIC will be presented as a valuable instrument in countering both class imbalance, and the famous “curse of dimensionality” problem. The method enables preservation of non-linearities within the dataset, while creating new instances without adding noise. The method will be compared with other oversampling methods, such as Random Oversampling, SMOTE, Borderline-SMOTE, and ADASYN. F1 scores show the validity of this new technique when used with imbalanced, multiclass, and high-dimensional datasets.

Highlights

The between-class imbalance is a well-known problem that afflicts numerous datasets.The classification task become even more difficult if there are very few instances in the dataset, a few hundred for example, and when each instance is composed of thousands of dimensions
In [3], the authors develop the idea of SMOTE to use SVM classifiers to deal with class imbalance problems; artificial minority class instances are generated around the borderline between two data classes
It has shown that LICIC with Linear KPCA is useful when the number of dimensions is very high

Summary

Introduction

The between-class imbalance is a well-known problem that afflicts numerous datasets. The classification task become even more difficult if there are very few instances in the dataset, a few hundred for example, and when each instance is composed of thousands of dimensions. Important Components for Imbalanced multiclass Classification), is designed to deal with datasets that have fewer instances than the number of dimensions, and where there is a strong skewness between the number of instances of different classes. It operates in “feature-space” rather than “data space”, preserving non-linearities present in datasets. It makes use of kernel PCA [6] on the whole dataset and works in Φ(x) transformed space, effecting permutations of less important components, to create new synthetic instances for each minority class.

Literature Review

Dataset Description

Kernel Principal Components Analysis with Pre-Image Computation

Linear

LICIC Algorithm

Experiments and Results

MicroMass Dataset Results

F1-Micro

Learning

GCM Dataset Results

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Dec 9, 2018
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

LICIC: Less Important Components for Imbalanced Multiclass Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.
Akash Kishore ... Lokeswari Venkataramana
Medical & Biological Engineering & Computing | VOL. 61
Akash Kishore, et. al.Akash Kishore ... Lokeswari Venkataramana
02 Aug 2023
Medical & Biological Engineering & Computing | VOL. 61

Batch-balanced focal loss: a hybrid solution to class imbalance in deep learning.
Jatin Singh ... Joseph Leader
Journal of medical imaging (Bellingham, Wash.) | VOL. 10
Jatin Singh, et. al.Jatin Singh ... Joseph Leader
23 Jun 2023
Journal of medical imaging (Bellingham, Wash.) | VOL. 10

Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm.
Garba Abdulrauf Sharifai ... Zurinahni Zainol
Genes | VOL. 11
Garba Abdulrauf Sharifai, et. al.Garba Abdulrauf Sharifai ... Zurinahni Zainol
27 Jun 2020
Genes | VOL. 11

RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data
Ahmed Arafa ... Marwa Radad
Journal of Biological Engineering | VOL. 17
Ahmed Arafa, et. al.Ahmed Arafa ... Marwa Radad
30 Jan 2023
Journal of Biological Engineering | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LICIC: Less Important Components for Imbalanced Multiclass Classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information