Does Deep Learning Work Well for Categorical Datasets with Mainly Nominal Attributes?

Yoichi Hayashi

doi:10.3390/electronics9111966

Abstract

Given the complexity of real-world datasets, it is difficult to present data structures using existing deep learning (DL) models. Most research to date has concentrated on datasets with only one type of attribute: categorical or numerical. Categorical data are common in datasets such as the German (-categorical) credit scoring dataset, which contains numerical, ordinal, and nominal attributes. The heterogeneous structure of this dataset makes very high accuracy difficult to achieve. DL-based methods have achieved high accuracy (99.68%) for the Wisconsin Breast Cancer Dataset, whereas DL-inspired methods have achieved high accuracy (97.39%) for the Australian credit dataset. However, to our knowledge, no such method has been proposed to classify the German credit dataset. This study aimed to provide new insights into the reasons why DL-based and DL-inspired classifiers do not work well for categorical datasets, mainly consisting of nominal attributes. We also discuss the problems associated with using nominal attributes to design high-performance classifiers. Considering the expanded utility of DL, this study's findings should aid in the development of a new type of DL that can handle categorical datasets consisting of mainly nominal attributes, which are commonly used in risk evaluation, finance, banking, and marketing.

Highlights

IntroductionWolpert [4,5] described what has come to be known as the no free lunch (NFL) theorem, which implies that all learning algorithms perform well when averaged over all possible datasets
Considering the expanded utility of deep learning (DL), this study's findings should aid in the development of a new type of DL that can handle categorical datasets consisting of mainly nominal attributes, which are commonly used in risk evaluation, finance, banking, and marketing
We investigated the Wisconsin Breast Cancer Dataset (WBCD) using Zhou and Feng’s codes. “----“means that the literature provided no information about the area under the receiver operating characteristic curve (AUC-ROC); TS ACC: accuracy for test dataset; SVM: support vector machine; 10CV: 10-fold cross-validation; 1D FCLF-convolutional neural networks (CNNs): one-dimensional fully-connected layer first convolutional neural network

Summary

Introduction

Wolpert [4,5] described what has come to be known as the no free lunch (NFL) theorem, which implies that all learning algorithms perform well when averaged over all possible datasets. This counterintuitive concept thereby suggests the infeasibility of finding a general, highly predictive algorithm. Gŏmez and Rojas [6] subsequently empirically investigated the effects of the NFL theorem on several popular machine learning (ML) classification techniques using real-world datasets.

Objectives

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Nov 21, 2020
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Does Deep Learning Work Well for Categorical Datasets with Mainly Nominal Attributes?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Effective Mining of Contrast Hybrid Patterns from Nominal-numerical Mixed Data
Min Fu ... Lei Duan
-
Min Fu, et. al.Min Fu ... Lei Duan
01 Jan 2021
01 Jan 2021

Abstract 184: The utility of deep metric learning for breast cancer identification on mammographic images
Justin Du ... Sanjay Aneja
Cancer Research | VOL. 81
Justin Du, et. al.Justin Du ... Sanjay Aneja
01 Jul 2021
Cancer Research | VOL. 81

Clustering Algorithm with Learnable Distance for Categorical Data with Nominal and Ordinal Attributes
Hong Jia ... Weiwei Zhong
-
Hong Jia, et. al.Hong Jia ... Weiwei Zhong
22 Jul 2022
22 Jul 2022

Identification of Nominal Attributes for Intelligent Classification of Chronic Kidney Disease using Optimization Algorithm
Jerlin Rubini Lambert ... Pramila Arulanthu
-
Jerlin Rubini Lambert, et. al.Jerlin Rubini Lambert ... Pramila Arulanthu
01 Jul 2020
01 Jul 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Does Deep Learning Work Well for Categorical Datasets with Mainly Nominal Attributes?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics