Predicting Corporate Profitability in Morocco: Comparing Classical Regression and Machine Learning

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

To the best of our knowledge, this study provides the first systematic comparison between classical regression and advanced machine learning models for predicting the profitability of Moroccan firms listed on the Casablanca Stock Exchange. While prior research has largely focused on developed markets, profitability prediction in emerging economies such as Morocco remains underexplored, despite the market’s structural particularities (sectoral concentration, reliance on bank financing, and limited disclosure practices). This article provides the first systematic comparative analysis between regression and machine learning approaches applied to Moroccan listed companies, highlighting the advantages and limitations of each method in capturing complex and non-linear financial dynamics. Using a dataset covering ten years of financial statements, we evaluate multiple models, including OLS, Ridge regression, Random Forest, Gradient Boosting, Support Vector Regression, KNN, and XGBoost. Results show that machine learning models consistently outperform regression in predictive accuracy, while regression retains value in interpretability. Findings contribute to academic research by extending profitability forecasting studies to an under-explored emerging market, and to practice by offering investors, policymakers, and managers tools that improve risk assessment, capital allocation, and decision-making under conditions of uncertainty. These implications are particularly relevant for emerging economies, where informational asymmetries and structural heterogeneity complicate financial forecasting.

Similar Papers
  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.geoen.2023.212086
Machine learning approaches for formation matrix volume prediction from well logs: Insights and lessons learned
  • Jul 8, 2023
  • Geoenergy Science and Engineering
  • Pamidi Venkata Durga Kannaiah + 1 more

Machine learning approaches for formation matrix volume prediction from well logs: Insights and lessons learned

  • Conference Article
  • 10.2118/224836-ms
Comparative Study of Machine Learning and Artificial Neural Networks for Porosity and Permeability Prediction in Reservoir Characterization
  • May 12, 2025
  • Zayan Khursheed + 7 more

Accurate reservoir characterization is essential for optimizing hydrocarbon recovery, particularly through precise estimation of porosity and permeability. This study employs multiple supervised Machine Learning (ML) models, including Decision Tree, Gradient Boosting, K-Nearest Neighbors (KNN), Linear Regression, Lasso Regression, Ridge Regression, Random Forest, and Support Vector Regression (SVR), to predict petrophysical properties using well log data. Additionally, an Artificial Neural Network (ANN) model was evaluated to compare its performance with traditional ML approaches. The model performance was assessed using Root Mean Squared Error (RMSE) and the coefficient of determination (R²). Results indicate that Random Forest emerged as the most accurate model for permeability prediction (R² = 0.9236, RMSE = 117.29), outperforming ANN, which exhibited overfitting issues. Gradient Boosting also performed well (R² = 0.799) but slightly overestimated porosity. In contrast, traditional regression models (Linear and Ridge) were effective for porosity estimation but struggled with permeability variability, while Lasso Regression and SVR failed to establish meaningful patterns. The ANN model, despite its capability to capture complex relationships, demonstrated poor generalization due to overfitting, making ML models like Random Forest and Gradient Boosting more reliable for reservoir characterization. This study highlights the superiority of ensemble ML models over both conventional regression techniques and ANN in handling non-linear geological complexities. Future research should explore hybrid ML-ANN models, optimize hyperparameters, and integrate additional petrophysical parameters to further enhance predictive accuracy.

  • Research Article
  • 10.4108/ew.7114
Comparison of Machine Learning and Deep Learning Models Performance in predicting wind energy
  • Jul 21, 2025
  • EAI Endorsed Transactions on Energy Web
  • Saswati Rakshit + 1 more

The prediction of wind energy generation is important to enhance the performance and dependability of renewable energy systems due to the rising demand for wind-generated electricity and advancements in wind energy technology competitiveness. This study leverages advanced machine learning (ML) and some other statistical and deep learning based time series forecasting models to enhance the accuracy of wind energy predictions. This comprehensive analysis includes nine ML models—Linear Regression, Random Forests (RF), Gradient Boosting Machines (GBM), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), AdaBoost, XGBoost, Support Vector Regression (SVR), and Neural Networks—as well as Four time-series forecasting models—ARIMA, Temporal Convolutional Networks (TCNs), Long Short-Term Memory (LSTM) networks and GRU. Each ML model underwent rigorous cross-validation to ensure optimal performance. The assessment criteria utilized here comprised the Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the R² Score. It was found that among the nine ML models, Random Forests, GBM and KNN consistently provided superior accuracy and robustness, making them the top choices for wind energy prediction whereas the performance of linear regression, SVM and SVR were very poor for the considered dataset. From the experiment, Random Forest, GBM, and KNN showed the best performance with low MSE values of 0.77, 1.95, and 1.51 respectively, while other models had MSEs above 7.5, with AdaBoost reaching 30. Their RMSEs (0.88, 1.40, 1.23) and MAEs (0.093, 0.73, 0.10) also indicate strong predictive accuracy compared to the rest.In this paper, time series forecasting, TCNs, LSTM and GRU networks showed strong capabilities in capturing temporal dependencies and trends within the wind energy data. Visualization techniques were employed to compare model performances comprehensively, providing clear insights into their predictive power. Therefore, this present study offers a robust framework for researchers and practitioners aiming to leverage machine learning and time series forecasting in the realm of renewable energy prediction.

  • Research Article
  • Cite Count Icon 21
  • 10.1016/j.conbuildmat.2022.129162
Improving asphalt mix design by predicting alligator cracking and longitudinal cracking based on machine learning and dimensionality reduction techniques
  • Nov 1, 2022
  • Construction and Building Materials
  • Jian Liu + 4 more

Improving asphalt mix design by predicting alligator cracking and longitudinal cracking based on machine learning and dimensionality reduction techniques

  • Research Article
  • Cite Count Icon 4
  • 10.1021/acssensors.5c00364
Enhancing the Predictive Performance of Molecularly Imprinted Polymer-Based Electrochemical Sensors Using a Stacking Regressor Ensemble of Machine Learning Models.
  • Apr 17, 2025
  • ACS sensors
  • Reza Mohammadi Dashtaki + 3 more

The performance of electrochemical sensors is influenced by various factors. To enhance the effectiveness of these sensors, it is crucial to find the right balance among these factors. Researchers and engineers continually explore innovative approaches to enhance sensitivity, selectivity, and reliability. Machine learning (ML) techniques facilitate the analysis and predictive modeling of sensor performance by establishing quantitative relationships between parameters and their effects. This work presents a case study on developing a molecularly imprinted polymer (MIP)-based sensor for detecting doxorubicin (Dox), emphasizing the use of ML-based ensemble models to improve performance and reliability. Four ML models, including Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), Random Forest (RF), and K-Nearest Neighbors (KNN), are used to evaluate the effect of each parameter on prediction performance, using the SHapley Additive exPlanations (SHAP) method to determine feature importance. Based on the analysis, removing a less influential feature and introducing a new feature significantly improved the model's predictive capabilities. By applying the min-max scaling technique, it is ensured that all features contribute proportionally to the model learning process. Additionally, multiple ML models─Linear Regression (LR), KNN, DT, RF, Adaptive Boosting (AdaBoost), Gradient Boosting (GB), Support Vector Regression (SVR), XGBoost, Bagging, Partial Least Squares (PLS), and Ridge Regression─are applied to the data set and their performance in predicting the sensor output current is compared. To further enhance prediction performance, a novel ensemble model is proposed that integrates DT, RF, GB, XGBoost, and Bagging regressors, leveraging their combined strengths to offset individual weaknesses. The main benefit of this work lies in its ability to enhance MIP-based sensor performance by developing a novel stacking regressor ensemble model, which improves prediction performance and reliability. This methodology is broadly applicable to the development of other sensors with different transducers and sensing elements. Through extensive simulation results, the proposed stacking regressor ensemble model demonstrated superior predictive performance compared to individual ML models. The model achieved an R-squared (R2) of 0.993, significantly reducing the root-mean-square error (RMSE) to 0.436 and the mean absolute error (MAE) to 0.244. These improvements enhanced sensitivity and reliability of the MIP-based electrochemical sensor, demonstrating a substantial performance gain over individual ML models.

  • Research Article
  • Cite Count Icon 41
  • 10.1016/j.conbuildmat.2022.126607
Optimizing asphalt mix design through predicting effective asphalt content and absorbed asphalt content using machine learning
  • Feb 7, 2022
  • Construction and Building Materials
  • Jian Liu + 4 more

Optimizing asphalt mix design through predicting effective asphalt content and absorbed asphalt content using machine learning

  • Research Article
  • Cite Count Icon 8
  • 10.1186/s12302-025-01078-w
Comparative analysis of machine learning models for predicting water quality index in Dhaka’s rivers of Bangladesh
  • Mar 3, 2025
  • Environmental Sciences Europe
  • Mosaraf Hosan Nishat + 8 more

The pollution in Dhaka's navigable waterways, including the Buriganga, Balu, Tongi Khal, and Turag rivers, is a significant concern due to rapid industrial and urban expansion. Industrial discharges, domestic sewage and inadequate waste management are the primary sources of this pollution, degrading water quality and threatening aquatic ecosystems. This study aimed to predict the Water Quality Index (WQI) of these rivers using fourteen machine learning (ML) models: Decision Tree Regression, Linear Regression, Ridge Regression, Stochastic Gradient Descent (SGD) Regressor, Extreme Gradient Boosting (XGB) Regressor, Light Gradient Boosting Machine (GBM) Regressor, Elastic Net Regressor, Support Vector Regression (SVM), Random Forest Regression, Bayesian Ridge Regressor, Artificial Neural Network (ANN), AdaBoost Regressor, CatBoost Regressor and Extra Trees Regressor. The objective was to evaluate and compare these models to identify the most effective predictive method for WQI, enabling efficient environmental monitoring and management of urban waterways. Among the evaluated ML models, ANN and Random Forest Regressor performed the best. The ANN model demonstrated superior predictive capability, achieving a Root Mean Squared Error (RMSE) of 2.34, a Mean Absolute Error (MAE) of 1.24, a Nash–Sutcliffe Efficiency (NSE) of 0.97, and a Coefficient of Determination (R2) of 0.97. Furthermore, an Adjusted R2 value of 0.965 further confirmed its ability to capture complex patterns in water quality data with remarkable accuracy. These findings emphasize the importance of using AI modeling techniques, specifically ANN and Random Forest Regression, to improve the accuracy of WQI forecasts for the waterways. This study contributes to the field of environmental science by offering a novel integration of feature selection techniques with ML models to enhance efficiency and cost-effectiveness of water quality monitoring. Unlike previous studies, this research specifically addresses the challenges of urban waterways in Dhaka, Bangladesh, a region significantly impacted by industrial and urban pollution. To our knowledge, this is the first study to apply such a comprehensive range of ML models to predict the WQI of Dhaka’s four major rivers. By providing a reliable methodology for WQI estimation, this study supports informed decision-making and proactive measures to protect vital water resources.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 28
  • 10.3390/ijgi9040276
Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH Prediction
  • Apr 23, 2020
  • ISPRS International Journal of Geo-Information
  • Panagiotis Tziachris + 4 more

In the current paper we assess different machine learning (ML) models and hybrid geostatistical methods in the prediction of soil pH using digital elevation model derivates (environmental covariates) and co-located soil parameters (soil covariates). The study was located in the area of Grevena, Greece, where 266 disturbed soil samples were collected from randomly selected locations and analyzed in the laboratory of the Soil and Water Resources Institute. The different models that were assessed were random forests (RF), random forests kriging (RFK), gradient boosting (GB), gradient boosting kriging (GBK), neural networks (NN), and neural networks kriging (NNK) and finally, multiple linear regression (MLR), ordinary kriging (OK), and regression kriging (RK) that although they are not ML models, they were used for comparison reasons. Both the GB and RF models presented the best results in the study, with NN a close second. The introduction of OK to the ML models’ residuals did not have a major impact. Classical geostatistical or hybrid geostatistical methods without ML (OK, MLR, and RK) exhibited worse prediction accuracy compared to the models that included ML. Furthermore, different implementations (methods and packages) of the same ML models were also assessed. Regarding RF and GB, the different implementations that were applied (ranger-ranger, randomForest-rf, xgboost-xgbTree, xgboost-xgbDART) led to similar results, whereas in NN, the differences between the implementations used (nnet-nnet and nnet-avNNet) were more distinct. Finally, ML models tuned through a random search optimization method were compared with the same ML models with their default values. The results showed that the predictions were improved by the optimization process only where the ML algorithms demanded a large number of hyperparameters that needed tuning and there was a significant difference between the default values and the optimized ones, like in the case of GB and NN, but not in RF. In general, the current study concluded that although RF and GB presented approximately the same prediction accuracy, RF had more consistent results, regardless of different packages, different hyperparameter selection methods, or even the inclusion of OK in the ML models’ residuals.

  • Research Article
  • 10.9734/psij/2025/v29i2874
Integrating the Asymptotic Iteration Method with Machine Learning for Predicting Vibrational Energy Levels of Diatomic Molecules
  • Mar 24, 2025
  • Physical Science International Journal
  • Omoriwhovo Jude Oghenekome + 1 more

Aim: This study integrates the Asymptotic Iteration Method (AIM) with Machine Learning (ML) models to enhance the prediction of vibrational energy levels in diatomic molecules. Traditional quantum mechanical methods, while accurate, are computationally demanding. This study aims to determine whether ML models can approximate these calculations efficiently while maintaining high accuracy. Methodology: The vibrational energy levels of Li₂, CN, and CO molecules were computed using AIM within the Morse potential framework. Three ML models—Random Forest (RF), Gradient Boosting (GB), and Support Vector Regression (SVR)—were trained using AIM-derived datasets. The models were evaluated using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R² score to assess their predictive performance. Results: SVR demonstrated the highest predictive accuracy, achieving a R² score of 0.999650, the lowest MAE and RMSE values of 0.124391 and 0.158412 respectively, outperforming RF and GB. The results indicate that ML models, particularly SVR, can effectively approximate AIM calculations with minimal error. Furthermore, 3D potential energy surface visualizations confirmed the strong agreement between ML and AIM predictions, validating the reliability of ML-based approaches. Conclusions: This study demonstrates that ML can serve as an efficient and scalable alternative to traditional quantum mechanical methods for predicting vibrational energy levels. The findings have implications for computational chemistry, spectroscopy, and materials science by reducing reliance on computationally intensive calculations. However, the study is limited by data generalization, as accuracy depends on the diversity of the training dataset. Future work should focus on expanding datasets, integrating deep learning techniques, and exploring hybrid AIM-ML models to improve generalizability and predictive robustness.

  • Research Article
  • 10.3390/healthcare13151805
The Relationship Between Smartphone and Game Addiction, Leisure Time Management, and the Enjoyment of Physical Activity: A Comparison of Regression Analysis and Machine Learning Models.
  • Jul 25, 2025
  • Healthcare (Basel, Switzerland)
  • Sevinç Namlı + 7 more

Background/Objectives: Smartphone addiction (SA) and gaming addiction (GA) have become risk factors for individuals of all ages in recent years. Especially during adolescence, it has become very difficult for parents to control this situation. Physical activity and the effective use of free time are the most important factors in eliminating such addictions. This study aimed to test a new machine learning method by combining routine regression analysis with the gradient-boosting machine (GBM) and random forest (RF) methods to analyze the relationship between SA and GA with leisure time management (LTM) and the enjoyment of physical activity (EPA) among adolescents. Methods: This study presents the results obtained using our developed GBM + RF hybrid model, which incorporates LTM and EPA scores as inputs for predicting SA and GA, following the preprocessing of data collected from 1107 high school students aged 15-19 years. The results were compared with those obtained using routine regression results and the lasso, ElasticNet, RF, GBM, AdaBoost, bagging, support vector regression (SVR), K-nearest neighbors (KNN), multi-layer perceptron (MLP), and light gradient-boosting machine (LightGBM) models. In the GBM + RF model, probability scores obtained from GBM were used as input to RF to produce final predictions. The performance of the models was evaluated using the R2, mean absolute error (MAE), and mean squared error (MSE) metrics. Results: Classical regression analyses revealed a significant negative relationship between SA scores and both LTM and EPA scores. Specifically, as LTM and EPA scores increased, SA scores decreased significantly. In contrast, GA scores showed a significant negative relationship only with LTM scores, whereas EPA was not a significant determinant of GA. In contrast to the relatively low explanatory power of classical regression models, ML algorithms have demonstrated significantly higher prediction accuracy. The best performance for SA prediction was achieved using the Hybrid GBM + RF model (MAE = 0.095, MSE = 0.010, R2 = 0.9299), whereas the SVR model showed the weakest performance (MAE = 0.310, MSE = 0.096, R2 = 0.8615). Similarly, the Hybrid GBM + RF model also showed the highest performance for GA prediction (MAE = 0.090, MSE = 0.014, R2 = 0.9699). Conclusions: These findings demonstrate that classical regression analyses have limited explanatory power in capturing complex relationships between variables, whereas ML algorithms, particularly our GBM + RF hybrid model, offer more robust and accurate modeling capabilities for multifactorial cognitive and performance-related predictions.

  • Research Article
  • Cite Count Icon 11
  • 10.1038/s41598-025-98607-7
Comparative analysis of machine learning techniques for temperature and humidity prediction in photovoltaic environments
  • May 5, 2025
  • Scientific Reports
  • Montaser Abdelsattar + 2 more

This research conducts a comparative analysis of nine Machine Learning (ML) models for temperature and humidity prediction in Photovoltaic (PV) environments. Using a dataset of 5,000 samples (80% for training, 20% for testing), the models—Support Vector Regression (SVR), Lasso Regression, Ridge Regression (RR), Linear Regression (LR), AdaBoost, Gradient Boosting (GB), Decision Tree (DT), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost)—were evaluated based on Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R²). For temperature prediction, XGBoost demonstrated the best performance, achieving the lowest MAE of 1.544, the lowest RMSE of 1.242, and the highest R² of 0.947, indicating strong predictive accuracy. Conversely, SVR had the weakest performance with an MAE of 4.558 and an R² of 0.674. Similarly, for humidity prediction, XGBoost outperformed other models, achieving an MAE of 3.550, RMSE of 1.884, and R² of 0.744, while SVR exhibited the lowest predictive power with an R² of 0.253. This comprehensive study serves as a benchmark for the application of ML models to environmental prediction in PV systems, a research area that is relatively important. Notably, the results underscore the performance advantage of ensemble-based approaches, especially for XGBoost and RF compared to simpler, linear-based methods such as LR and SVR, when it comes to well-dispersed environmental interactions. The proposed machine-learning based power generation analysis approach shows significant improvements in predictive analytics capabilities for renewable energy systems, as well as a means for real-time monitoring and maintenance practices to improve PV performance and reliability.

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.cscm.2024.e03092
A comprehensive comparison of various machine learning algorithms used for predicting the splitting tensile strength of steel fiber-reinforced concrete
  • Mar 27, 2024
  • Case Studies in Construction Materials
  • Seyed Soroush Pakzad + 2 more

A comprehensive comparison of various machine learning algorithms used for predicting the splitting tensile strength of steel fiber-reinforced concrete

  • Research Article
  • Cite Count Icon 2
  • 10.1115/1.4067131
Evaluation of Machine Learning Models for Predicting the Hot Deformation Flow Stress of Sintered Al–Zn–Mg Alloy
  • Nov 28, 2024
  • Journal of Engineering Materials and Technology
  • Katika Harikrishna + 2 more

In predicting flow stress, machine learning (ML) offers significant advantages by leveraging data-driven approaches, enhancing material design, and accurately forecasting material performance. Thus, the present study employs various supervised ML models, including linear regression (Lasso and Ridge), support vector regression (SVR), ensemble methods (random forest (RF), gradient boosting (GB), extreme gradient boosting (XGB)), and neural networks (artificial neural network (ANN), multilayer perceptron (MLP)), to predict flow stress in the hot deformation of an Al–Zn–Mg alloy. The ML methodology involves sequential steps from data extraction to cross-validation and hyperparameter tuning, which is conducted using the hyperopt library. Model performance is assessed using average absolute relative error (AARE), root-mean-squared error (RMSE), and mean squared error (MSE). The results show that ensemble methods (RF, GB, XGB) and neural networks outperform traditional regression methods, demonstrating superior predictive accuracy. Visualization using half-violin plots reveals the models' error ranges, with XGB consistently exhibiting the best performance. SVR, RF, GB, XGB, ANN, and MLP showed better performance than the Arrhenius model in the context of AARE and MSE metrics. Interestingly, SVR had a somewhat higher AARE of 1.89% and an MSE of 0.251 MPa2, while XGB had the lowest AARE of 0.2% and the lowest MSE of 0.011 MPa2. When ML models were evaluated using the skill score in relation to the Arrhenius model, XGB scored higher than the support vector machine (SVM) at 0.714, with a score of 0.986. In contrast, Lasso and Ridge exhibited negative scores of −0.847 and −0.456, respectively.

  • Research Article
  • Cite Count Icon 20
  • 10.1371/journal.pone.0317619
Evaluating Machine Learning and Deep Learning models for predicting Wind Turbine power output from environmental factors.
  • Jan 23, 2025
  • PloS one
  • Montaser Abdelsattar + 4 more

This study presents a comprehensive comparative analysis of Machine Learning (ML) and Deep Learning (DL) models for predicting Wind Turbine (WT) power output based on environmental variables such as temperature, humidity, wind speed, and wind direction. Along with Artificial Neural Network (ANN), Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN), and Convolutional Neural Network (CNN), the following ML models were looked at: Linear Regression (LR), Support Vector Regressor (SVR), Random Forest (RF), Extra Trees (ET), Adaptive Boosting (AdaBoost), Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). Using a dataset of 40,000 observations, the models were assessed based on R-squared, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). ET achieved the highest performance among ML models, with an R-squared value of 0.7231 and a RMSE of 0.1512. Among DL models, ANN demonstrated the best performance, achieving an R-squared value of 0.7248 and a RMSE of 0.1516. The results show that DL models, especially ANN, did slightly better than the best ML models. This means that they are better at modeling non-linear dependencies in multivariate data. Preprocessing techniques, including feature scaling and parameter tuning, improved model performance by enhancing data consistency and optimizing hyperparameters. When compared to previous benchmarks, the performance of both ANN and ET demonstrates significant predictive accuracy gains in WT power output forecasting. This study's novelty lies in directly comparing a diverse range of ML and DL algorithms while highlighting the potential of advanced computational approaches for renewable energy optimization.

  • Research Article
  • Cite Count Icon 37
  • 10.1016/j.jenvman.2023.119866
Optimisation and interpretation of machine and deep learning models for improved water quality management in Lake Loktak
  • Dec 25, 2023
  • Journal of Environmental Management
  • Swapan Talukdar + 7 more

Optimisation and interpretation of machine and deep learning models for improved water quality management in Lake Loktak

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.