Artificial Intelligence vs. Efficient Markets: A Critical Reassessment of Predictive Models in the Big Data Era
This paper critically examines artificial intelligence applications in stock market forecasting, addressing significant gaps in the existing literature that often overlook the tension between theoretical market efficiency and empirical predictability. While numerous reviews catalog methodologies, they frequently fail to rigorously evaluate model performance across different market regimes or reconcile statistical significance with economic relevance. We analyze techniques ranging from traditional statistical models to advanced deep learning architectures, finding that ensemble methods like Extra Trees, Random Forest, and XGBoost consistently outperform single classifiers, achieving directional accuracy of up to 86% in specific market conditions. Our analysis reveals that hybrid approaches integrating multiple data sources demonstrate superior performance by capturing complementary market signals, yet many models showing statistical significance fail to generate economic value after accounting for transaction costs and market impact. By addressing methodological challenges including backtest overfitting, regime changes, and implementation constraints, we provide a novel comprehensive framework for rigorous model assessment that bridges the divide between academic research and practical implementation. This review makes three key contributions: (1) a reconciliation of the Efficient Market Hypothesis with AI-driven predictability through an adaptive market framework, (2) a multi-dimensional evaluation methodology that extends beyond classification accuracy to financial performance, and (3) an identification of promising research directions in explainable AI, transfer learning, causal modeling, and privacy-preserving techniques that address current limitations.
- Research Article
3
- 10.1002/cjce.25738
- May 4, 2025
- The Canadian Journal of Chemical Engineering
Miscible gas injection techniques, such as nitrogen injection, are among the attractive enhanced oil recovery (EOR) techniques for improving oil recovery factors in oil reservoirs. A key challenge in implementing these techniques is accurately determining the minimum miscibility pressure (MMP). While laboratory experiments offer reliable results, they are costly and time‐consuming, and existing empirical correlations often have moderate accuracy, which limits their practical use. In this study, robust ensemble methods, namely light gradient boosting machine (LightGBM), extra trees (ET), and categorical boosting (CatBoost), were implemented for modelling MMP in pure nitrogen and gas mixtures containing nitrogen–crude oil systems. An extensive experimental database involving 164 data points was used to elaborate on the predictive models. The findings revealed that the proposed ensemble methods achieved outstanding accuracy in training and test datasets, with ET consistently outperforming the other models. The ET model provided the most consistent MMP predictions with a total root mean square error (RMSE) of only 0.3197 MPa and a determination coefficient of 0.9976. Additionally, the ET model exhibited very small RMSE values across a broad range of operational conditions. Furthermore, the Shapley additive explanations (SHAP) method further validated the interpretability of the ET model, allowing for clear insights into the impact of input features. This study underlines the significant potential of machine learning to enhance MMP prediction in pure nitrogen and gas mixtures containing nitrogen–crude oil systems, thereby aiding in the appropriate design of this kind of EOR process and supporting better decision‐making in reservoir management.
- Research Article
- 10.1007/s44257-025-00037-2
- Jul 10, 2025
- Discover Analytics
Queuing up for a service is sometimes an inevitable experience. The inefficiencies brought on by extended waiting times can be considerably decreased by precise waiting time prediction. Accurate prediction can substantially improve consumer satisfaction by reducing uncertainty. It is possible to introduce a robust approach to the prediction of waiting times based on previous queuing data and artificial intelligence (AI) algorithms. This paper contributes to the field by offering a robust approach to waiting time prediction and suggests potential directions for further research. The investigation leverages ensemble tree-based methods along with one statistical model, supplemented by various data pre-processing techniques for regression analysis to forecast precise waiting times. The following regression models have been used to assess the performance: Random Forest (RF), Extra Trees (ET), Gradient Boosting (GBR), Histogram-Based Gradient Boosting (HGBR), Voting (VR) and Ridge Regression. Among these, the ET Regressor demonstrates superior performance. Principal Component Analysis (PCA), t-distributed Stochastic Neighbor Embedding (t-SNE), and Autoencoders have been evaluated to compare the effectiveness of different dimensionality reduction techniques. Furthermore, the challenge of data imbalance in classification tasks has also been addressed here using the Synthetic Minority Oversampling Technique (SMOTE). This process impressively enhances classification accuracy, especially for minority classes. Transparency and trustworthiness in the predictive system have been ensured through the use of Explainable Artificial Intelligence (XAI) techniques, which help interpret the decision-making processes of the models.
- Research Article
- 10.30574/wjaets.2025.15.2.0635
- May 30, 2025
- World Journal of Advanced Engineering Technology and Sciences
The rapid advancements in artificial intelligence and machine learning have led to the development of highly sophisticated models capable of superhuman performance in a variety of tasks. However, the increasing complexity of these models has also resulted in them becoming "black boxes", where the internal decision-making process is opaque and difficult to interpret. This lack of transparency and explainability has become a significant barrier to the widespread adoption of these models, particularly in sensitive domains such as healthcare and finance. To address this challenge, the field of Explainable AI has emerged, focusing on developing new methods and techniques to improve the interpretability and explainability of machine learning models. This review paper aims to provide a comprehensive overview of the research exploring the combination of Explainable AI and traditional machine learning approaches, known as "hybrid models". This paper discusses the importance of explainability in AI, and the necessity of combining interpretable machine learning models with black-box models to achieve the desired trade-off between accuracy and interpretability. It provides an overview of key methods and applications, integration techniques, implementation frameworks, evaluation metrics, and recent developments in the field of hybrid AI models. The paper also delves into the challenges and limitations in implementing hybrid explainable AI systems, as well as the future trends in the integration of explainable AI and traditional machine learning. Altogether, this paper will serve as a valuable reference for researchers and practitioners working on developing explainable and interpretable AI systems. Keywords: Explainable AI (XAI), Traditional Machine Learning (ML), Hybrid Models, Interpretability, Transparency, Predictive Accuracy, Neural Networks, Ensemble Methods, Decision Trees, Linear Regression, SHAP (Shapley Additive Explanations), LIME (Local Interpretable Model-agnostic Explanations), Healthcare Analytics, Financial Risk Management, Autonomous Systems, Predictive Maintenance, Quality Control, Integration Techniques, Evaluation Metrics, Regulatory Compliance, Ethical Considerations, User Trust, Data Quality, Model Complexity, Future Trends, Emerging Technologies, Attention Mechanisms, Transformer Models, Reinforcement Learning, Data Visualization, Interactive Interfaces, Modular Architectures, Ensemble Learning, Post-Hoc Explainability, Intrinsic Explainability, Combined Models
- Research Article
1
- 10.3390/computers14090374
- Sep 8, 2025
- Computers
Artificial intelligence (AI) is rapidly redefining both computer science and cybersecurity by enabling more intelligent, scalable, and privacy-conscious systems. While most prior surveys treat these fields in isolation, this paper provides a unified review of 256 peer-reviewed publications to bridge that gap. We examine how emerging AI paradigms, such as explainable AI (XAI), AI-augmented software development, and federated learning, are shaping technological progress across both domains. In computer science, AI is increasingly embedded throughout the software development lifecycle to boost productivity, improve testing reliability, and automate decision making. In cybersecurity, AI drives advances in real-time threat detection and adaptive defense. Our synthesis highlights powerful cross-cutting findings, including shared challenges such as algorithmic bias, interpretability gaps, and high computational costs, as well as empirical evidence that AI-enabled defenses can reduce successful breaches by up to 30%. Explainability is identified as a cornerstone for trust and bias mitigation, while privacy-preserving techniques, including federated learning and local differential privacy, emerge as essential safeguards in decentralized environments such as the Internet of Things (IoT) and healthcare. Despite transformative progress, we emphasize persistent limitations in fairness, adversarial robustness, and the sustainability of large-scale model training. By integrating perspectives from two traditionally siloed disciplines, this review delivers a unified framework that not only maps current advances and limitations but also provides a foundation for building more resilient, ethical, and trustworthy AI systems.
- Research Article
1
- 10.1007/s10614-025-11024-w
- Jun 26, 2025
- Computational Economics
Stock market forecasting is a complex research problem due to the complexity of the factors influencing stock market trends. This survey provides a comprehensive overview of recent advancements in stock market forecasting, focusing on the impact of large language models (LLMs) in financial analytics. The survey explores the strengths and challenges of feature engineering, ensemble methods, hybrid models, text-based prediction and reinforcement learning. It then presents the transformative impact of LLMs, highlighting their capabilities in utilizing transfer learning and few-shot learning to understand complex financial information, enhancing sentiment analysis, improving portfolio management, and stock forecasting accuracy. A key novelty of this survey lies in presenting comprehensive analysis of the strengths and weaknesses of LLMs for different financial tasks in addition to exploring how LLMs can be combined with machine learning and reinforcement learning approaches to overcome their limitations in handling unstructured data, improving model explainability, and enhancing generalizability. Finally, this survey identifies existing research gaps and limitations, proposing future research directions aimed at improving prediction accuracy and utilizing both LLMs and predictive models’ capabilities in stock market forecasting.
- Research Article
- 10.1200/jco.2024.42.4_suppl.688
- Feb 1, 2024
- Journal of Clinical Oncology
688 Background: Internationally avelumab is approved as maintenance therapy for patients (pts) with LA/mUC whose disease did not progress after 1L platinum-based chemotherapy. However, 54% of pts progressed on avelumab. Limited data are available on predictive biomarker of efficacy. Artificial intelligence (AI) methods are being increasingly investigated to generate predictive models applicable in clinical practice. In this study, we developed a set of machine learning (ML) classifiers and survival analysis algorithms using real-world data to predict response and progression free survival (PFS) in LA/mUC patients treated with avelumab. We also applied explainability to the developed algorithms. Methods: We prospectively collected real-world data from 115 pts receiving Avelumab from 2021 to 2022 treated in 20 institutions in Italy (MALVA dataset). In order to predict the efficacy of immunotherapy (IO), 2 different outcomes were studied: Objective Response Rate (ORR) and Progression Free Survival (PFS). The dataset was split between training and test set, with a 80%-20% ratio.The missing values were imputed using a Bayesian Ridge iterative imputer, fitted on the training set. Eight different classifier models were used for ORR: XGBoost (XGB), Logistic Regression (LR), Random Forest (RF), Multilayer Perceptron (MLP), Support Vector Machine (SVM), Adaboost (AB), Extra Trees (ET) and LightGBM (LGBM). Five ML survival analysis models were used to analyse PFS: Cox Proportional Hazards (CPH), Random Survival Forest (RSF), Gradient Boosting (GB), Extra Survival Trees (EST) and Survival Support Vector Machine (SSVM). Finally, SHAP values, an eXplainable AI (XAI) technique, were calculated to evaluate each feature and to explain the predictions. Results: According to clinical expertise, 31 features were selected through a clinical hypothesis. For ORR prediction, the two best performing models were XGB and ET, both without using oversampling. On the test set, XGB achieved a F1 score of 0.77, accuracy of 0.77 and AUC of 0.81, while ET reached F1 score and accuracy of 0.81 and AUC of 0.80. Regarding the prediction of PFS, EST and RSF obtained the best performances with a c-index of 0.71 and 0.72, and Average AUC of 0.75 and 0.76, respectively. According to SHAP, the most important feature for predicting ORR was: ORR after 1st line CHT, while bone metastases, absolute leukocytes number at baseline and ECOG PS were the most important features for the PFS prediction. Conclusions: Machine learning is useful to predict efficacy in advanced urothelial carcinoma. The explainability models confirmed what have been discovered within the last years of immune-research conferring trustworthiness to the ML models. Further validation of these approaches on larger and external pts cohorts is needed.
- Research Article
- 10.54691/bcpbm.v32i.2855
- Nov 22, 2022
- BCP Business & Management
With the development of global economic integration, the stock market occupies an important position in the global economy. Accurately predict the stock market is of important social and economic value, the stock market has huge amounts of data sources, such data features to capture the hidden rule of the stock market, and associated accurately predict proposed the new challenge, with the vigorous development of the data mining technology and the data sample is unceasingly rich, The value of data is more fully recognized and more widely concerned, and business data analysis has been gradually applied to stock market forecasting, For example: Machine learning, data mining. This paper studies the business analysis in the era of big data, so that the stock market can get higher economic benefits. For the stock market, machine learning, statistical reasoning and other methods can be used as theoretical research, but the practical application needs to be prepared and improved according to the real market environment.
- Research Article
- 10.37745/ejcsit.2013/vol13n433951
- May 15, 2025
- European Journal of Computer Science and Information Technology
This article examines the integration of artificial intelligence in financial evaluation and the vital role of explainability in building trustworthy decision support systems. As AI transforms traditional financial evaluation from forecasting to portfolio management, the inherent opacity of sophisticated algorithms creates tension with the financial sector's transparency requirements. The discussion explores how Explainable AI techniques—particularly SHAP values and LIME—enable financial professionals to understand AI-generated insights while maintaining regulatory compliance. Through examining real-world implementations, the article demonstrates quantifiable benefits of explainable models in reducing false positives, improving analyst confidence, and accelerating regulatory approval. The evaluation extends to comprehensive Responsible AI frameworks encompassing fairness and bias mitigation, privacy-preserving techniques, and adversarial resilience mechanisms. The discussion addresses how generative AI assistants revolutionize document evaluation by automating summarization and data extraction while confronting critical security challenges, including prompt injection attacks, data leakage, and regulatory compliance complexities. The article emphasizes human-in-the-loop paradigms and tiered governance frameworks that successfully balance innovation with appropriate oversight, while examining real-time explainability challenges and monitoring requirements. Forward-looking perspectives on regulatory harmonization and the convergence of explainable, privacy-preserving, and robust AI systems demonstrate the evolution toward trustworthy financial AI implementations.
- Research Article
- 10.1186/s40537-025-01238-y
- Jul 17, 2025
- Journal of Big Data
Classifying Electroencephalogram (EEG) signals for wheelchair navigation presents significant challenges due to high dimensionality, noise, outliers, and class imbalances. This study proposes an optimized classification framework that evaluates ten machine learning (ML) models, emphasizing ensemble methods, feature selection (FS), and outlier utilization. The dataset, comprising 2869 samples and 141 features, was processed using Recursive Feature Elimination (RFE) and correlation thresholds (CTs), achieving a peak accuracy of 69% with Extra Trees after FS. Notably, training on outlier-only data yielded even higher accuracy (Extra Trees: 82%), underscoring the value of outliers in enhancing class separability. Receiver Operating Characteristic–Precision Recall (ROC-PR) curve analysis confirmed that Extra Trees achieved a ROC AUC (Area Under Curve) of 0.92 and PR AUC of 0.82 for the best-classified movement command, while other models exhibited lower precision-recall (PR) balance. This approach, complemented by explainability techniques, offers a robust solution for EEG-based wheelchair control systems and paves the way for interpretable brain-computer interfaces (BCIs).
- Research Article
1
- 10.1371/journal.pone.0260373
- Feb 3, 2022
- PloS one
The formation of an efficient market depends on the competition between different investment strategies, which accelerates all available information into asset prices. By incorporating market impact and two kinds of investment strategies into an agent-based model, we have investigated the coevolutionary mechanism of different investment strategies and the role of market impact in shaping a competitive advantage in financial markets. The coevolution of history-dependent strategies and reference point strategies depends on the levels of market impact and risk tolerance. For low market impact and low risk tolerance, the majority-win effect makes the trend-following strategies become dominant strategies. For high market impact and low risk tolerance, the minority-win effect makes the trend-rejecting strategies coupled with trend-following strategies become dominant strategies. The coupled effects of price fluctuations and strategy distributions have been investigated in depth. A U-shape distribution of history-dependent strategies is beneficial for a stable price, which is destroyed by the existence of reference point strategies with low risk tolerance. A δ-like distribution of history-dependent strategies leads to a large price fluctuation, which is suppressed by the existence of reference point strategies with high risk tolerance. The strategies that earn more in an inefficient market lose more in an efficient market. Such a result gives us another explanation for the principle of risk-profit equilibrium in financial markets: high return in an inefficient market should be coupled with high risk in an efficient market, low return in an inefficient market should be coupled with low risk in an efficient market.
- Research Article
- 10.32628/ijsrset2411322
- Jun 10, 2024
- International Journal of Scientific Research in Science, Engineering and Technology
Diabetes Mellitus (DM) is a persistent health issue in many countries and is a leading cause of heart disease, kidney failure, and blindness The International Diabetes Federation (IDF) estimated in 2019 that at least 463 million people worldwide aged 20-79 suffer from diabetes. This number is expected to rise to 578 million by 2030 and 700 million by 2045. Machine learning is a type of machine learning that is very helpful in various fields, including healthcare. In classification cases, ensemble methods classify by combining decisions from several other models, one way being through majority voting. Ensemble methods often produce more accurate classification or prediction results. Several ensemble methods include random forest, extra trees, rotation forest, and double random forest. The data used in this study is part of research on the development and clinical testing of a prototype non-invasive blood glucose monitoring device by the non-invasive biomarking team at IPB. The data includes both invasive and non-invasive blood glucose measurements collected in 2019. This study compares the performance of the random forest, extra trees, rotation forest, and double random forest models on blood glucose level data obtained from non-invasive devices. The research results show that the Rotation Forest algorithm is the best model, with the highest average accuracy compared to the other three algorithms, achieving an accuracy level of 0.7142857 (71.42%).
- Research Article
249
- 10.1016/j.energy.2018.08.207
- Aug 30, 2018
- Energy
Tree-based ensemble methods for predicting PV power generation and their comparison with support vector regression
- Research Article
5
- 10.1038/s41598-025-02355-7
- May 22, 2025
- Scientific Reports
Chronic kidney disease is a persistent ailment marked by the gradual decline of kidney function. Its classification primarily relies on the estimated glomerular filtration rate and the existence of kidney damage. The kidney disease improving global outcomes organization has established a widely accepted system for categorizing chronic kidney disease. explainable artificial intelligence for classification involves creating machine learning models that not only accurately predict outcomes but also offer clear and interpretable explanations for their decisions. Traditional machine learning models often pose difficulties in comprehending the intricate processes behind specific classification choices due to their intricate and obscure nature. In this study, an explainable artificial intelligence-chronic kidney disease model is introduced for the process of classification. The model applies explainable artificial intelligence by utilizing extra trees and shapley additive explanations values. Also, binary breadth-first search algorithm is used to select the most important features for the proposed explainable artificial intelligence-chronic kidney disease model. This methodology is designed to derive valuable insights for enhancing decision-making strategies within the field of classifying chronic kidney diseases. The performance of the proposed model is compared with another machine learning models, namely, random forest, decision tree, bagging classifier, adaptive boosting, and k-nearest neighbor, and the performance of the models is evaluated using accuracy, sensitivity, specificity, F-score, and area under the ROC curve. The experimental results demonstrated that the proposed model achieved the best results with accuracy equals 99.9%.
- Conference Article
6
- 10.5339/qfarc.2016.ictsp1534
- Jan 1, 2016
Motivation Chronic kidney disease (CKD) refers to the loss of kidney functions over time which is primarily to filter blood. Based on its severity it can be classified into various stages with the later ones requiring regular dialysis or kidney transplant. Chronic kidney disease mostly affects patients suffering from the complications of diabetes or high blood pressure and hinders their ability to carry out day-to-day activities. In Qatar, due to the rapidly changing lifestyle there has been an increase in the number of patients suffering from CKD. According to Hamad Medical Corporation [2], about 13% of Qatar's population suffers from CKD, whereas the global prevalence is estimated to be around 8–16% [3]. CKD can be detected at an early stage and can help at-risk patients from a complete kidney failure by simple tests that involve measuring blood pressure, serum creatinine and urine albumin [1]. Our goal is to use machine learning techniques and build a classification model that can predict if an individ...
- Research Article
1
- 10.1051/shsconf/202419401003
- Jan 1, 2024
- SHS Web of Conferences
In recent years, the swift progress of artificial intelligence (AI) has significantly influenced trading practices, providing traders with advanced algorithms that improve decision-making and enhance trading strategies, leading to increased profits and reduced risks. The onset of the era of big data has further enriched this field, offering access to extensive financial data, such as historical stock prices, company financial statements, financial news articles, social media sentiments, and macroeconomic indicators—all publicly available. By identifying complex patterns and correlations within this vast data set, deep learning (DL) algorithms have proven their ability to predict stock prices and market trends more accurately than traditional methods. This comprehensive survey aims to provide an insightful examination of various deeplearning models employed in stock market forecasting. The primary objective is to categorize these models into two distinct types: Uni-modal and multimodal models. By exploring the nuances within each category, this literature survey provides a comprehensive understanding of these models’ strengths, applications, and contributions to the constantly evolving research landscape of stock market forecasting. Our survey adopts a systematic approach to categorize and analyze deep-learning models in stock market forecasting. Leveraging established databases and repositories, we will compile a comprehensive dataset comprising academic articles, conference papers, and other scholarly publications related to DL in finance. This dataset will span a defined period, allowing us to capture the temporal evolution of research trends in stock market prediction. The first phase involves extracting and compiling relevant literature from established databases, including but not limited to Scopus, Web of Science, and Google Scholar. This dataset will serve as the foundation for exploring the evolving landscape of DL applications in stock market forecasting. Subsequently, advanced techniques and methodologies will be employed to analyze citation patterns, model co-occurrence, and the intellectual structure of research in this domain. Our research identifies influential authors, collaboration networks, and geographical distribution of research activities to uncover emerging clusters of research excellence. The findings of this survey contribute valuable insights to both academia and industry. By categorizing and examining the strengths of uni-modal and multi-modal deep-learning models, researchers can refine their methodologies, and practitioners can make informed decisions regarding adopting predictive models in financial markets. Furthermore, the survey aims to guide future research directions, enhancing the overall effectiveness of predictive models in the dynamic landscape of stock market forecasting. In conclusion, this survey aims to provide a comprehensive overview of deeplearning models in stock market forecasting. By systematically categorizing and analyzing these models, our study aspires to contribute to the ongoing dialogue on integrating AI in financial practices, fostering a deeper understanding of the field’s evolution and future directions.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.