Identifying Factors Influencing Box-Office Success of Motion Pictures with XAI

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

ABSTRACT The film industry, a multi-billion-dollar sector, faces persistent uncertainty in predicting box-office success. Despite advances in machine learning (ML), existing models often lack transparency and fail to incorporate dynamic audience feedback. This study proposes an explainable artificial intelligence (XAI)-powered decision support framework to identify and interpret the risk factors influencing movie profitability. Using data from YouTube Trailer Reviews, BoxOfficeMojo, IMDb, and TMDb, the framework integrates ML algorithms with three complementary XAI tools, Permutation Feature Importance (PFI), Feature Importance Ranking Measure (FIRM), and SHapley Additive exPlanations (SHAP), to balance predictive accuracy with interpretability. This approach enhances stakeholders’ trust in AI-driven predictions by providing both global and local explanations for model outcomes. Results reveal that financial factors and audience sentiment are key determinants of profitability, and integrating social media text significantly improves forecasting precision. Beyond the film industry, the proposed framework offers a transferable model for data-driven decision-making in other sectors such as e-commerce and streaming services, where consumer engagement and sentiment are critical success factors. This study contributes to Sustainable Development Goal by promoting transparent and responsible use of AI for innovation and risk reduction in creative industries.

Similar Papers
  • Research Article
  • Cite Count Icon 63
  • 10.1016/j.habitatint.2022.102660
An explainable model for the mass appraisal of residences: The application of tree-based Machine Learning algorithms and interpretation of value determinants
  • Aug 31, 2022
  • Habitat International
  • Muzaffer Can Iban

An explainable model for the mass appraisal of residences: The application of tree-based Machine Learning algorithms and interpretation of value determinants

  • Research Article
  • 10.18240/ijo.2025.07.04
Associations between organophosphorus pesticides exposure and age-related macular degeneration risk in U.S. adults: analysis from interpretable machine learning approaches.
  • Jul 18, 2025
  • International journal of ophthalmology
  • Yu-Xin Jiang + 2 more

To investigate the associations between urinary dialkyl phosphate (DAP) metabolites of organophosphorus pesticides (OPPs) exposure and age-related macular degeneration (AMD) risk. Participants were drawn from the National Health and Nutrition Examination Survey (NHANES) between 2005 and 2008. Urinary DAP metabolites were used to construct a machine learning (ML) model for AMD prediction. Several interpretability pipelines, including permutation feature importance (PFI), partial dependence plot (PDP), and SHapley Additive exPlanations (SHAP) analyses were employed to analyze the influence from exposure features to prediction outcomes. A total of 1845 participants were included and 137 were diagnosed with AMD. Receiver operating characteristic curve (ROC) analysis evaluated Random Forests (RF) as the best ML model with its optimal predictive performance among eleven models. PFI and SHAP analyses illustrated that DAP metabolites were of significant contribution weights in AMD risk prediction, higher than most of the socio-demographic covariates. Shapley values and waterfall plots of randomly selected AMD individuals emphasized the predictive capacity of ML with high accuracy and sensitivity in each case. The relationships and interactions visualized by graphical plots and supported by statistical measures demonstrated the indispensable impacts from six DAP metabolites to the prediction of AMD risk. Urinary DAP metabolites of OPPs exposure are associated with AMD risk and ML algorithms show the excellent generalizability and differentiability in the course of AMD risk prediction.

  • Research Article
  • Cite Count Icon 1
  • 10.51594/farj.v6i6.1233
Integrating machine learning algorithms into audit processes: Benefits and challenges
  • Jun 15, 2024
  • Finance & Accounting Research Journal
  • Beatrice Oyinkansola Adelakun + 3 more

The integration of machine learning (ML) algorithms into audit processes represents a significant advancement in the field of auditing, offering substantial benefits in terms of efficiency, accuracy, and risk management. This review examines the transformative potential of ML in auditing, highlighting its key benefits and the challenges that must be addressed to fully leverage its capabilities. Machine learning algorithms, with their ability to analyze large datasets and identify patterns, enhance the accuracy and thoroughness of audits. Traditional auditing methods often rely on sampling and manual checks, which can miss anomalies and fraudulent activities. In contrast, ML algorithms can process entire datasets, uncovering subtle patterns and irregularities that may indicate fraud or errors. This comprehensive analysis reduces the risk of oversight and improves the reliability of audit findings. One of the primary benefits of ML in auditing is its capacity for anomaly detection. ML models can be trained on historical data to understand normal financial behavior and flag deviations that might signify irregularities. This ability to detect anomalies in real-time enables auditors to identify potential issues promptly, reducing the time lag between occurrence and detection of fraud. Predictive analytics, powered by ML, further enhances audit processes by forecasting future risks based on historical data. This proactive approach allows auditors to anticipate and mitigate risks before they materialize, contributing to more robust risk management strategies. Despite these advantages, integrating ML into audit processes presents several challenges. Ensuring data quality and integrity is crucial, as ML algorithms are only as good as the data they analyze. Poor-quality data can lead to inaccurate predictions and conclusions. Additionally, the "black box" nature of some ML algorithms can pose transparency issues, making it difficult for auditors to explain how specific conclusions were reached, which is critical for stakeholder trust and regulatory compliance. Another significant challenge is the potential for algorithmic bias. ML models can inadvertently perpetuate existing biases in the data, leading to unfair or skewed audit outcomes. Continuous monitoring and validation of ML algorithms are necessary to detect and mitigate such biases. In conclusion, while integrating machine learning algorithms into audit processes offers substantial benefits in terms of accuracy, efficiency, and risk management, it also necessitates careful attention to data quality, transparency, and bias mitigation. Addressing these challenges is essential to fully realize the potential of ML in enhancing audit practices. Keywords: Benefits, Challenges, Audit Processes, Algorithms, ML.

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.imed.2024.09.005
Blood pressure abnormality detection and Interpretation utilizing Explainable Artificial Intelligence
  • Feb 1, 2025
  • Intelligent Medicine
  • Hedayetul Islam + 2 more

Blood pressure abnormality detection and Interpretation utilizing Explainable Artificial Intelligence

  • Research Article
  • Cite Count Icon 13
  • 10.1088/2057-1976/acb1b3
Can we explain machine learning-based prediction for rupture status assessments of intracranial aneurysms?
  • Mar 10, 2023
  • Biomedical Physics & Engineering Express
  • Nan Mu + 8 more

Although applying machine learning (ML) algorithms to rupture status assessment of intracranial aneurysms (IA) has yielded promising results, the opaqueness of some ML methods has limited their clinical translation. We presented the first explainability comparison of six commonly used ML algorithms: multivariate logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), multi-layer perceptron neural network (MLPNN), and Bayesian additive regression trees (BART). A total of 112 IAs with known rupture status were selected for this study. The ML-based classification used two anatomical features, nine hemodynamic parameters, and thirteen morphologic variables. We utilized permutation feature importance, local interpretable model-agnostic explanations (LIME), and SHapley Additive exPlanations (SHAP) algorithms to explain and analyze 6 Ml algorithms. All models performed comparably: LR area under the curve (AUC) was 0.71; SVM AUC was 0.76; RF AUC was 0.73; XGBoost AUC was 0.78; MLPNN AUC was 0.73; BART AUC was 0.73. Our interpretability analysis demonstrated consistent results across all the methods; i.e., the utility of the top 12 features was broadly consistent. Furthermore, contributions of 9 important features (aneurysm area, aneurysm location, aneurysm type, wall shear stress maximum during systole, ostium area, the size ratio between aneurysm width, (parent) vessel diameter, one standard deviation among time-averaged low shear area, and one standard deviation of temporally averaged low shear area less than 0.4 Pa) were nearly the same. This research suggested that ML classifiers can provide explainable predictions consistent with general domain knowledge concerning IA rupture. With the improved understanding of ML algorithms, clinicians’ trust in ML algorithms will be enhanced, accelerating their clinical translation.

  • Research Article
  • Cite Count Icon 10
  • 10.1111/gwao.12748
“On and off screen: Women's work in the screen industries”
  • Sep 8, 2021
  • Gender, Work & Organization
  • Louise Wallenberg + 1 more

“On and off screen: Women's work in the screen industries”

  • Research Article
  • 10.3390/realestate2030012
Machine Learning Algorithms and Explainable Artificial Intelligence for Property Valuation
  • Aug 1, 2025
  • Real Estate
  • Gabriella Maselli + 1 more

The accurate estimation of urban property values is a key challenge for appraisers, market participants, financial institutions, and urban planners. In recent years, machine learning (ML) techniques have emerged as promising tools for price forecasting due to their ability to model complex relationships among variables. However, their application raises two main critical issues: (i) the risk of overfitting, especially with small datasets or with noisy data; (ii) the interpretive issues associated with the “black box” nature of many models. Within this framework, this paper proposes a methodological approach that addresses both these issues, comparing the predictive performance of three ML algorithms—k-Nearest Neighbors (kNN), Random Forest (RF), and the Artificial Neural Network (ANN)—applied to the housing market in the city of Salerno, Italy. For each model, overfitting is preliminarily assessed to ensure predictive robustness. Subsequently, the results are interpreted using explainability techniques, such as SHapley Additive exPlanations (SHAPs) and Permutation Feature Importance (PFI). This analysis reveals that the Random Forest offers the best balance between predictive accuracy and transparency, with features such as area and proximity to the train station identified as the main drivers of property prices. kNN and the ANN are viable alternatives that are particularly robust in terms of generalization. The results demonstrate how the defined methodological framework successfully balances predictive effectiveness and interpretability, supporting the informed and transparent use of ML in real estate valuation.

  • Research Article
  • Cite Count Icon 3
  • 10.3390/bios15040220
Advancements in Circulating Tumor Cell Detection for Early Cancer Diagnosis: An Integration of Machine Learning Algorithms with Microfluidic Technologies.
  • Mar 29, 2025
  • Biosensors
  • Ling An + 2 more

Circulating tumor cells (CTCs) are vital indicators of metastasis and provide a non-invasive method for early cancer diagnosis, prognosis, and therapeutic monitoring. However, their low prevalence and heterogeneity in the bloodstream pose significant challenges for detection. Microfluidic systems, or "lab-on-a-chip" devices, have emerged as a revolutionary tool in liquid biopsy, enabling efficient isolation and analysis of CTCs. These systems offer advantages such as reduced sample volume, enhanced sensitivity, and the ability to integrate multiple processes into a single platform. Several microfluidic techniques, including size-based filtration, dielectrophoresis, and immunoaffinity capture, have been developed to enhance CTC detection. The integration of machine learning (ML) with microfluidic systems has further improved the specificity and accuracy of CTC detection, significantly advancing the speed and efficiency of early cancer diagnosis. ML models have enabled more precise analysis of CTCs by automating detection processes and enhancing the ability to identify rare and heterogeneous cell populations. These advancements have already demonstrated their potential in improving diagnostic accuracy and enabling more personalized treatment approaches. In this review, we highlight the latest progress in the integration of microfluidic technologies and ML algorithms, emphasizing how their combination has changed early cancer diagnosis and contributed to significant advancements in this field.

  • Research Article
  • 10.1038/s41598-025-19959-8
Predicting carotid plaques in metabolic dysfunction-associated steatotic liver disease using machine learning and SHAP interpretation.
  • Oct 15, 2025
  • Scientific reports
  • Shu-Mei Zhai + 3 more

Cardiovascular disease (CVD) remains the most common cause of death worldwide. Carotid plaque is an indicator of subclinical CVDs. Metabolic dysfunction-associated steatotic liver disease (MASLD) is a risk factor for atherosclerotic CVDs. We aimed to develop and validate a predictive model for carotid plaque occurrence in annual health check-up populations, to integrate health check-up indicators with machine learning (ML) algorithms and LASSO-based feature selection and leverage advanced interpretability frameworks to elucidate the contribution of individual risk factors. In this retrospective cohort study, we enrolled 4,973 MASLD patients, among whom 1,178 were diagnosed with carotid plaques using carotid ultrasound. Collected baseline data included ​demographic indicators, ​clinical histories, blood ​biochemical parameters, and liver function test indicators. A predictive model for carotid plaques was developed and validated using five ML algorithms. Model performance was evaluated based on the​ area under the curve, ​sensitivity, ​specificity, ​accuracy, and ​F1 Score. For model interpretability, we adopted the ​Shapley Additive Explanations (SHAP) framework to quantify the contribution of individual features to the prediction outcomes. Among the five ML algorithm models, the support vectors machine model demonstrated superior discriminative capability, higher goodness-of-fit, and greater clinical utility compared to other ML algorithm models. Moreover, age, systolic blood pressure, total cholesterol, sex, and fasting plasma glucose were the most important risk factors associated with carotid plaques in the MASLD population. This study demonstrated the feasibility of constructing a predictive model for carotid plaques in MASLD populations using health check-up indicators combined with ML algorithms. The application of SHAP methods enhanced model interpretability by quantifying the contribution of individual risk factors to prediction outcomes, enabling clinicians to identify high risk MASLD patients prone to carotid plaque development, so that they can adjust interventions accordingly.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.3389/fcimb.2023.1289124
Gut microbiota landscape and potential biomarker identification in female patients with systemic lupus erythematosus using machine learning.
  • Dec 19, 2023
  • Frontiers in Cellular and Infection Microbiology
  • Wenzhu Song + 6 more

Systemic Lupus Erythematosus (SLE) is a complex autoimmune disease that disproportionately affects women. Early diagnosis and prevention are crucial for women's health, and the gut microbiota has been found to be strongly associated with SLE. This study aimed to identify potential biomarkers for SLE by characterizing the gut microbiota landscape using feature selection and exploring the use of machine learning (ML) algorithms with significantly dysregulated microbiotas (SDMs) for early identification of SLE patients. Additionally, we used the SHapley Additive exPlanations (SHAP) interpretability framework to visualize the impact of SDMs on the risk of developing SLE in females. Stool samples were collected from 54 SLE patients and 55 Negative Controls (NC) for microbiota analysis using 16S rRNA sequencing. Feature selection was performed using Elastic Net and Boruta on species-level taxonomy. Subsequently, four ML algorithms, namely logistic regression (LR), Adaptive Boosting (AdaBoost), Random Forest (RF), and eXtreme gradient boosting (XGBoost), were used to achieve early identification of SLE with SDMs. Finally, the best-performing algorithm was combined with SHAP to explore how SDMs affect the risk of developing SLE in females. Both alpha and beta diversity were found to be different in SLE group. Following feature selection, 68 and 21 microbiota were retained in Elastic Net and Boruta, respectively, with 16 microbiota overlapping between the two, i.e., SDMs for SLE. The four ML algorithms with SDMs could effectively identify SLE patients, with XGBoost performing the best, achieving Accuracy, Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value, and AUC values of 0.844, 0.750, 0.938, 0.923, 0.790, and 0.930, respectively. The SHAP interpretability framework showed a complex non-linear relationship between the relative abundance of SDMs and the risk of SLE, with Escherichia_fergusonii having the largest SHAP value. This study revealed dysbiosis in the gut microbiota of female SLE patients. ML classifiers combined with SDMs can facilitate early identification of female patients with SLE, particularly XGBoost. The SHAP interpretability framework provides insight into the impact of SDMs on the risk of SLE and may inform future scientific treatment for SLE.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.1007/s13202-024-01789-5
Automated real-time prediction of geological formation tops during drilling operations: an applied machine learning solution for the Norwegian Continental Shelf
  • Apr 8, 2024
  • Journal of Petroleum Exploration and Production Technology
  • Behzad Elahifar + 1 more

Accurate prediction of geological formation tops is a crucial task for optimizing hydrocarbon exploration and production activities. This research investigates and conducts a comprehensive comparative analysis of several advanced machine learning approaches tailored for the critical application of geological formation top prediction within the complex Norwegian Continental Shelf (NCS) region. The study evaluates and benchmarks the performance of four prominent machine learning models: Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest ensemble method, and Multi-Layer Perceptron (MLP) neural network. To facilitate a rigorous assessment, the models are extensively evaluated across two distinct datasets - a dedicated test dataset and a blind dataset independent for validation. The evaluation criteria revolve around quantifying the models' predictive accuracy in successfully classifying multiple geological formation top types. Additionally, the study employs the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm as a baseline benchmarking technique to contextualize the relative performance of the machine learning models against a conventional clustering approach. Leveraging two model-agnostic feature importance analysis techniques - Permutation Feature Importance (PFI) and Shapley Additive exPlanations (SHAP), the investigation identifies and ranks the most influential input variables driving the predictive capabilities of the models. The comprehensive analysis unveils the MLP neural network model as the top-performing approach, achieving remarkable predictive accuracy with a perfect score of 0.99 on the blind validation dataset, surpassing the other machine learning techniques as well as the DBSCAN benchmark. However, the SVM model attains superior performance on the initial test dataset, with an accuracy of 0.99. Intriguingly, the PFI and SHAP analyses converge in consistently pinpointing depth (DEPT), revolution per minute (RPM), and Hook-load (HKLD) as the three most impactful parameters influencing model predictions across the different algorithms. These findings underscore the potential of sophisticated machine learning methodologies, particularly neural network-based models, to significantly enhance the accuracy of geological formation top prediction within the geologically complex NCS region. However, the study emphasizes the necessity for further extensive testing on larger datasets to validate the generalizability of the high performance observed. Overall, this research delivers an exhaustive comparative evaluation of state-of-the-art machine learning techniques, offering critical insights to guide the optimal selection, development, and real-world deployment of accurate and reliable predictive modeling strategies tailored for hydrocarbon exploration and reservoir characterization endeavors in the NCS.Graphical abstract

  • Research Article
  • 10.1093/neuonc/noad179.0081
BIOS-04. THE ROLE OF MACHINE LEARNING, PREDICTIVE MODELING, AND DEEP LEARNING IN ASSESSING METABOLIC BIOMARKERS TO IMPROVE PROGNOSTICATION IN GLIOBLASTOMA MULTIFORME
  • Nov 10, 2023
  • Neuro-Oncology
  • Cathleen Kuo + 2 more

Glioblastoma multiforme (GBM) is the most common primary malignant brain tumor in the United States, accounting for approximately 56.6% of all gliomas and 47.7% of all primary malignant CNS tumors. The prognosis of GBM is notably grim, with a 1-year relative survival rate of 41.4% and a 5-year survival rate of 5.8% following diagnosis. Recent efforts to identify potential therapeutic targets have utilized tumor omics data integrated with clinical information that leverages machine learning (ML) algorithms. However, there remains a paucity of studies assessing the value of these ML models as prognostic tools in GBM. A systematic search adhering to PRISMA guidelines was conducted to identify all studies describing the use of a ML algorithm involving GBM metabolic biomarkers and each algorithm's accuracy. Ten studies were included for final analysis. They were diagnostic (n = 3, 30%), prognostic (n = 6, 60%), or both (n = 1, 10%), respectively. Most studies analyzed data from multiple databases, while 50% (n = 5) included additional original samples. At least 2,536 data samples were run through a ML algorithm. 27 ML algorithms were recorded with a mean 2.8 algorithms per study. Algorithms were supervised (n = 22, 79%) or unsupervised (n = 6, 21%), and continuous (n = 21, 75%) or categorical (n = 7, 25%). The mean reported accuracy and AUC of ROC was 95.63% and 0.779, respectively. 106 metabolic markers were identified, but only EMP3 was reported in multiple studies. Many studies have identified potential biomarkers for GBM diagnosis and prognostication. These algorithms show promise; although, a consensus on even a handful of biomarkers has not been made. An integration of ML algorithms for biomarker detection combined with radiomics-based tumor imaging will be necessary to ascertain the greatest level of accuracy and precision.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.31083/j.rcm2501008
Machine Learning for Detecting Atrial Fibrillation from ECGs: Systematic Review and Meta-Analysis.
  • Jan 8, 2024
  • Reviews in cardiovascular medicine
  • Chenggong Xie + 4 more

Atrial fibrillation (AF) is a common arrhythmia that can result in adverse cardiovascular outcomes but is often difficult to detect. The use of machine learning (ML) algorithms for detecting AF has become increasingly prevalent in recent years. This study aims to systematically evaluate and summarize the overall diagnostic accuracy of the ML algorithms in detecting AF in electrocardiogram (ECG) signals. The searched databases included PubMed, Web of Science, Embase, and Google Scholar. The selected studies were subjected to a meta-analysis of diagnostic accuracy to synthesize the sensitivity and specificity. A total of 14 studies were included, and the forest plot of the meta-analysis showed that the pooled sensitivity and specificity were 97% (95% confidence interval [CI]: 0.94-0.99) and 97% (95% CI: 0.95-0.99), respectively. Compared to traditional machine learning (TML) algorithms (sensitivity: 91.5%), deep learning (DL) algorithms (sensitivity: 98.1%) showed superior performance. Using multiple datasets and public datasets alone or in combination demonstrated slightly better performance than using a single dataset and proprietary datasets. ML algorithms are effective for detecting AF from ECGs. DL algorithms, particularly those based on convolutional neural networks (CNN), demonstrate superior performance in AF detection compared to TML algorithms. The integration of ML algorithms can help wearable devices diagnose AF earlier.

  • Research Article
  • 10.18203/issn.2455-4510.intjresorthop20240402
Supervised machine learning algorithms used to predict post-surgical outcomes following anterior surgical fixation of odontoid fractures
  • Feb 26, 2024
  • International Journal of Research in Orthopaedics
  • Mikayla Kricfalusi + 9 more

Background: Odontoid fractures have a high mortality rate, and numerous classification systems have previously predicted surgical outcomes with mixed consensus. We generated a machine learning (ML) construct to predict post-operative adverse events following anterior (ORIF) of odontoid fractures. Methods: 266 patients from the American college of surgeons-national surgical quality improvement program (ACS-NSQIP) with anterior ORIF (CPT 22318) of odontoid fractures from 2008-2018 were analyzed using ML algorithms random forest classifier (RF), gradient boosting classifier (GB), support vector machine classifier (SVM), Gaussian Naive Bayes classifier (GNB), and multi-layer perceptron classifier (MLP), and were compared to logistic regression classifier (LR). Algorithms predicted increased length of stay (LOS), need for transfusion (Transf), non-home discharge (NHD), and any adverse event (AAE). Permutation feature importance (PFI) identified risk factors. Results: ML algorithms outperformed LR. The average AUC for predicting Transf was 0.635 (accuracy=77.4%), extended LOS=0.652 (accuracy 59.6%), NHD 0.788 (accuracy=71.9%) and AAE 0.649 (accuracy 68.1%). GB performed highest for Transf (AUC=0.861), identifying operative time (PFI 0.253, p=0.016). GB and RF performed equally for NHD (AUC=0.819), highlighting preoperative hematocrit (PFI=0.157, p<0.001). GB predicted AAE (AUC=0.720) also identifying preoperative hematocrit (PFI=0.112, p<0.001). RF predicted extended LOS (AUC=0.669) highlighting preoperative hematocrit (PFI=0.049, p<0.001). Conclusions: ML outperformed LR, successfully predicting Transf, extended LOS, NHD, and AAE for anterior ORIF of odontoid fractures. Our construct may complement conventional risk stratification to reduce adverse outcomes and excess cost.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.compbiomed.2023.107871
Explainable machine learning model based on clinical factors for predicting the disappearance of indeterminate pulmonary nodules
  • Dec 22, 2023
  • Computers in Biology and Medicine
  • Jingxuan Wang + 5 more

Explainable machine learning model based on clinical factors for predicting the disappearance of indeterminate pulmonary nodules

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.