Is Deep Learning on Tabular Data Enough? An Assessment

Sheikh Amir Fayaz,Muheet Ahmed Butt,Sameer Kaul,Majid Zaman

doi:10.14569/ijacsa.2022.0130454

Abstract

It is critical to select the model that best fits the situation while analyzing the data. Many scholars on classification and regression issues have offered ensemble techniques on tabular data, as well as other approaches to classification and regression problems (Like Boosting and Logistic Model tree ensembles). Furthermore, various deep learning algorithms have recently been implemented on tabular data, with the authors claiming that deep models outperform Boosting and Model tree approaches. On a range of datasets including historical geographical data, this study compares the new deep models (TabNet, NODE, and DNF-net) against the boosting model (XGBoost) to see if they should be regarded a preferred choice for tabular data. We look at how much tweaking and computation they require, as well as how well they perform based on the metrics evaluation and statistical significance test. According to our study, XGBoost outperforms these deep models across all datasets, including the datasets used in the journals that presented the deep models. We further show that, when compared to deep models, XGBoost requires considerably less tweaking. In addition, we can also confirm that a combination of deep models with XGBoost outperforms XGBoost alone on almost all datasets.

Full Text