This study investigates the effectiveness of machine learning models in predicting audit opinions using a dataset from the FiinPro-X platform, comprising 9,783 audited consolidated financial statements from public companies listed on Vietnamese stock exchanges from 2016 to 2023. The dataset spans various industries, excluding banks and financial institutions, and focuses on identifying key financial, non-financial, and qualitative variables that influence audit opinions. Six supervised learning algorithms were applied—Logistic Regression, K-Nearest Neighbors (KNN), Decision Trees, Random Forests, Support Vector Machines (SVM), and Naive Bayes—evaluated based on their ability to predict both fully acceptable (unqualified) and non-fully acceptable audit opinions. All data processing and model training were implemented in a Python environment. The Random Forest model demonstrated the best overall performance, achieving an accuracy of 0.868 and an AUC-ROC of 0.87, though its F1 score for predicting non-fully acceptable audit opinions was lower (0.585). This suggests that while machine learning models can improve prediction accuracy, challenges remain in handling imbalanced data and non-linear relationships among input variables. The study also reduced the number of features by 30%, improving the models’ performance. Future research should further refine data and feature construction processes to ensure comparability and practical applicability.
Read full abstract