ABSTRACT This study aims to develop a tax non-compliance prediction model in Tunisia using supervised machine learning algorithms. A data mining analysis was conducted following the Knowledge Discovery in Databases (KDD) process, utilizing a dataset of 20,930 labeled observations from 2013 to 2017, comprising 110 attributes. We employed supervised learning algorithms, including K-Nearest Neighbors, Decision Trees, Naïve Bayes, Gradient Boosting, and Random Forest, to identify the most accurate model. Notably, Random Forest outperformed the other algorithms, achieving a prediction accuracy of 83%. Furthermore, through a combined interpretation of feature importance derived from Random Forest, SHAP value analysis, and ANOVA, our findings provide tax auditors with insights into the most influential attributes for predicting tax non-compliance. This study holds significant practical implications by enhancing the efficiency of tax audits and supporting tax authorities in their efforts to combat tax non-compliance.
Read full abstract