Determining Factors Affecting Acceptance of Autonomous Vehicles using Statistical and Machine Learning Models
The aim of this study was to find out the risks and perceptions related to the acceptance of Autonomous Vehicles (AVs) with regards to different aspects of society. An online survey was used for collection of stated preference data. The data of 465 respondents was deemed suitable for the analysis of this study. Comparison with traditional vehicles and willingness to use had the highest ratings while being tech-savvy had the lowest ratings. Parametric analysis and prediction model were used to analyze the relationships between the willingness to use and participants’ characteristics and opinions. The model was developed using artificial neural network. The results show that gender, age, affinity for technology and comparison with traditional vehicles seem to have a significant impact on the perception of participants. This was shown by the parametric analysis performed at a significance level of 5% and later confirmed by the model. The model showed the highest importance of being tech-savvy with 0.76 index followed by comparison with an index of 0.74. A comparison with a similar study from Saudi Arabia shows that drivers in these countries have a significantly different perception related to AVs.
- Research Article
29
- 10.3389/fcvm.2022.812276
- Apr 6, 2022
- Frontiers in Cardiovascular Medicine
ObjectiveTo compare the performance, clinical feasibility, and reliability of statistical and machine learning (ML) models in predicting heart failure (HF) events.BackgroundAlthough ML models have been proposed to revolutionize medicine, their promise in predicting HF events has not been investigated in detail.MethodsA systematic search was performed on Medline, Web of Science, and IEEE Xplore for studies published between January 1, 2011 to July 14, 2021 that developed or validated at least one statistical or ML model that could predict all-cause mortality or all-cause readmission of HF patients. Prediction Model Risk of Bias Assessment Tool was used to assess the risk of bias, and random effect model was used to evaluate the pooled c-statistics of included models.ResultTwo-hundred and two statistical model studies and 78 ML model studies were included from the retrieved papers. The pooled c-index of statistical models in predicting all-cause mortality, ML models in predicting all-cause mortality, statistical models in predicting all-cause readmission, ML models in predicting all-cause readmission were 0.733 (95% confidence interval 0.724–0.742), 0.777 (0.752–0.803), 0.678 (0.651–0.706), and 0.660 (0.633–0.686), respectively, indicating that ML models did not show consistent superiority compared to statistical models. The head-to-head comparison revealed similar results. Meanwhile, the immoderate use of predictors limited the feasibility of ML models. The risk of bias analysis indicated that ML models' technical pitfalls were more serious than statistical models'. Furthermore, the efficacy of ML models among different HF subgroups is still unclear.ConclusionsML models did not achieve a significant advantage in predicting events, and their clinical feasibility and reliability were worse.
- Research Article
13
- 10.3390/met11111858
- Nov 18, 2021
- Metals
The quality of a welded joint is determined by key attributes such as dilution and the weld bead geometry. Achieving optimal values associated with the above-mentioned attributes of welding is a challenging task. Selecting an appropriate method to derive the parameter optimality is the key focus of this paper. This study analyzes several versatile parametric optimization and prediction models as well as uses statistical and machine learning models for further processing. Statistical methods like grey-based Taguchi optimization is used to optimize the input parameters such as welding current, wire feed rate, welding speed, and contact tip to work distance (CTWD). Advanced features of artificial neural network (ANN) and adaptive neuro-fuzzy interface system (ANFIS) models are used to predict the values of dilution and the bead geometry obtained during the welding process. The results corresponding to the initial design of the welding process are used as training and testing data for ANN and ANFIS models. The proposed methodology is validated with various experimental results outside as well as inside the initial design. From the observations, the prediction results produced by machine learning models delivered significantly high relevance with the experimental data over the regression analysis.
- Research Article
1
- 10.1186/s42397-024-00208-8
- Feb 24, 2025
- Journal of Cotton Research
BackgroundCotton is one of the most important commercial crops after food crops, especially in countries like India, where it’s grown extensively under rainfed conditions. Because of its usage in multiple industries, such as textile, medicine, and automobile industries, it has greater commercial importance. The crop’s performance is greatly influenced by prevailing weather dynamics. As climate changes, assessing how weather changes affect crop performance is essential. Among various techniques that are available, crop models are the most effective and widely used tools for predicting yields.ResultsThis study compares statistical and machine learning models to assess their ability to predict cotton yield across major producing districts of Karnataka, India, utilizing a long-term dataset spanning from 1990 to 2023 that includes yield and weather factors. The artificial neural networks (ANNs) performed superiorly with acceptable yield deviations ranging within ± 10% during both vegetative stage (F1) and mid stage (F2) for cotton. The model evaluation metrics such as root mean square error (RMSE), normalized root mean square error (nRMSE), and modelling efficiency (EF) were also within the acceptance limits in most districts. Furthermore, the tested ANN model was used to assess the importance of the dominant weather factors influencing crop yield in each district. Specifically, the use of morning relative humidity as an individual parameter and its interaction with maximum and minimum temperature had a major influence on cotton yield in most of the yield predicted districts. These differences highlighted the differential interactions of weather factors in each district for cotton yield formation, highlighting individual response of each weather factor under different soils and management conditions over the major cotton growing districts of Karnataka.ConclusionsCompared with statistical models, machine learning models such as ANNs proved higher efficiency in forecasting the cotton yield due to their ability to consider the interactive effects of weather factors on yield formation at different growth stages. This highlights the best suitability of ANNs for yield forecasting in rainfed conditions and for the study on relative impacts of weather factors on yield. Thus, the study aims to provide valuable insights to support stakeholders in planning effective crop management strategies and formulating relevant policies.
- Research Article
36
- 10.1007/s11356-023-30428-5
- Oct 23, 2023
- Environmental Science and Pollution Research
The escalating levels of carbon dioxide (CO2) emissions represent the primary driver of global warming, and addressing them is of paramount importance. Timely and accurate prediction, as well as effective control of CO2 emissions, are pivotal for guiding mitigation measures. This paper aims to select the best prediction model for near-real-time daily CO2 emissions in China. The prediction models are based on univariate daily time-series data spanning January 1st, 2020, to September 30st, 2022. Six models are proposed, including three statistical models: grey prediction (GM(1,1)), autoregressive integrated moving average (ARIMA), and seasonal autoregressive integrated moving average with exogenous factors (SARIMAX), and three machine learning models: artificial neural network (ANN), random forest (RF), and long short-term memory (LSTM). The performance of these six models is evaluated using five criteria: mean squared error (MSE), root-mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and coefficient of determination (R2). Our findings reveal that the three machine learning models consistently outperform the three statistical models across all five criteria. Among them, the LSTM model demonstrates exceptional performance for daily CO2 emission prediction, boasting an impressively low MSE value of 3.5179e-04, an RMSE value of 0.0187, an MAE value of 0.0140, an MAPE value of 14.8291%, and a high R2 value of 0.9844. This underscores the robustness of the LSTM model in capturing and predicting complex emission patterns, positioning it as the most suitable option for near-real-time daily CO2 emission prediction based on the provided daily time series data. Moreover, our study's results provide valuable insights into emissions forecasting, enabling data-driven decision-making for policymakers and stakeholders. The accurate and timely predictions offered by the LSTM model can aid in the formulation of effective strategies to mitigate carbon emissions, contributing to a more sustainable future. Furthermore, the findings of this study can enhance our understanding of the dynamics of CO2 emissions, leading to more informed environmental policies and actions aimed at reducing carbon emissions.
- Conference Article
1
- 10.1115/es2019-3923
- Jul 14, 2019
Thermal load prediction is a key part of energy system management and control in buildings, and its accuracy plays a critical role to improve and maintain building energy performance and efficiency. To address this issue, various types of prediction model have been considered and studied, such as physics-based, statistical, and machine learning models. Physical models can be accurate but require extended lead time for model development. Statistical models are relatively simple to develop and require less computation time than other models, but they may not provide accurate results for complex energy systems with an intricate nonlinear dynamic behavior. This study proposes an Artificial Neural Network (ANN) model, one of the prevalent machine learning methods to predict building thermal load, combining with the concept of Non-linear Auto-Regression with Exogenous inputs (NARX). NARX-ANN prediction model is distinguished from typical ANN models due to the fact that the NARX concept can address nonlinear system behaviors effectively based on recurrent architectures and time indexing features. To examine the suitability and validity of NARX-ANN model for building thermal load prediction, a case study is carried out using field data of an academic campus building at Mississippi State University. Results show that the proposed NARX-ANN model can provide an accurate prediction performance and effectively address nonlinear system behaviors in the prediction.
- Research Article
- 10.21037/tgh-24-24
- Oct 1, 2024
- Translational gastroenterology and hepatology
Liver transplantation is the gold standard treatment for patients with hepatocellular carcinoma (HCC). Current allocation systems face a complex issue due to the imbalance between available organs and recipients. The prioritization of HCC patients remains controversial, leading to potential disparities in access to transplantation. Factors such as tumor size, alpha-fetoprotein (AFP) levels, Model of End-Stage Liver Disease (MELD) score, and response to locoregional therapy (LRT) contribute to determining waitlist dropout risk in HCC patients. Several statistical and machine learning (ML) models have been proposed to predict waitlist dropout, incorporating variables related to tumor and patient factors, underlying liver disease, and waitlist time. This narrative review aims to summarize the evidence regarding different prediction models of HCC waitlist dropout. All published articles up to December 25, 2023, were considered. Articles not based on prediction models using conventional statistical methods or ML models were excluded. Factors such as tumor size, AFP levels, MELD score, and LRT response have been shown to impact disease progression in these patients, influencing waitlist dropout. Most articles in the literature are based on statistical models. Both ML and statistical models may offer promising results, but their application is currently limited. Several attempts have been made to find the best model to stratify the risk of waitlist dropout in HCC patients. However, to date, none of the explored models have been implemented. The allocation of HCC recipients is still based on supplementary scoring systems or geographical criteria. Improving methodology and databases in future research is essential to obtain accurate and reliable models for clinicians. This is the only way to achieve real applicability.
- Research Article
- 10.31436/imjm.v24i04.2895
- Oct 1, 2025
- IIUM Medical Journal Malaysia
INTRODUCTION: Epidemiological studies have emphasized the role of Streptococcus gallolyticus subspecies gallolyticus (Sgg) infection in the development of colorectal cancer (CRC), yet it remains underappreciated. While statistical and machine learning (ML) models can enhance CRC prediction, direct comparisons between them are rare. This study aims to assess the diagnostic accuracy of stool polymerase chain reaction (PCR) for Sgg and immunochemical fecal occult blood test (iFOBT) for CRC detection and to compare multivariable statistical and ML models in predicting CRC. MATERIALS AND METHODS: A hospital-based case-control study with a reversed flow design was conducted, involving 33 CRC cases and 80 controls. The analysis incorporated Asia Pacific Colorectal Screening (APCS) risk factors into three predictive models: logistic regression (LR), decision tree (DT), and ensemble Bayesian boosted decision tree (BDT). RESULTS: Combined testing achieved a net sensitivity of 54%, outperforming individual tests (iFOBT=12.1%, Stool PCR=48.5%). Among the models, the ensemble BDT approach demonstrated the highest classification accuracy for CRC (BDT= 78.1%; DT=72.4%; LR=69.9%). The DT model identified iFOBT as the sole predictor, while the BDT ensemble model prioritized positive stool PCR for Sgg as the primary predictor, followed by normal to overweight body mass index and individuals aged over 53 years. CONCLUSION: The ensemble ML model incorporating Sgg infection demonstrated superior predictive performance. Screening for Sgg in stool samples has the potential as an early CRC detection strategy, particularly for individuals with a normal to overweight BMI and those above 53 years old.
- Research Article
18
- 10.1371/journal.pone.0275702
- Jun 15, 2023
- PLOS ONE
The forecasting of horticulture commodity prices, such as bananas, has wide-ranging impacts on farmers, traders and end-users. The considerable volatility in horticultural commodities pricing estimates has allowed farmers to exploit various local marketplaces for profitable sales of their farm produce. Despite the demonstrated efficacy of machine learning models as a suitable substitute for conventional statistical approaches, their application for price forecasting in the context of Indian horticulture remains an area of contention. Past attempts to forecast agricultural commodity prices have relied on a wide variety of statistical models, each of which comes with its own set of limitations. Although machine learning models have emerged as formidable alternatives to more conventional statistical methods, there is still reluctance to use them for the purpose of predicting prices in India. In the present investigation, we have analysed and compared the efficacy of a variety of statistical and machine learning models in order to get accurate price forecast. Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average model (SARIMA), Autoregressive Conditional Heteroscedasticity model (ARCH), Generalized Autoregressive Conditional Heteroscedasticity model (GARCH), Artificial Neural Network (ANN) and Recurrent Neural Network (RNN) were fitted to generate reliable predictions of prices of banana in Gujarat, India from January 2009 to December 2019. Empirical comparisons have been made between the predictive accuracy of different machine learning (ML) models and the typical stochastic model and it is observed that ML approaches, especially RNN, surpassed all other models in the majority of situations. Mean Absolute Percent Error (MAPE), Root Mean Square Error (RMSE), symmetric mean absolute percentage error (SMAPE), mean absolute scaled error (MASE) and mean directional accuracy (MDA) are used to illustrate the superiority of the models and RNN resulted least in terms of all error accuracy measures. RNN outperforms other models in this study for predicting accurate prices when compared to various statistical and machine learning techniques. The accuracy of other methodologies like ARIMA, SARIMA, ARCH GARCH, and ANN falls short of expectations.
- Research Article
- 10.1177/03611981241263824
- Aug 6, 2024
- Transportation Research Record: Journal of the Transportation Research Board
As the era of autonomous vehicles (AVs) approaches, understanding how passengers’ time use during a trip may change from a traditional vehicle (non-AV) to an AV is important to the adoption and use of AVs. In this study, a latent class analysis (LCA) as well as a latent transition analysis (LTA) are adopted to investigate the choice of travel activities of individuals as passengers in a traditional vehicle, such as a car or transit, and the anticipated shift in these activities in an AV. Since individuals may perform different activities during different trip purposes, activity choices and non-AV to AV transition dynamics are explored from two different perspectives: commute trips (e.g., to work or school) and non-commute trips (e.g., leisure, errands, or medical). Findings from the LCA models show three distinct groups of individuals with varying activity preferences in a traditional vehicle and four distinct groups that could emerge in an AV. AV users exhibited a higher preference for activities such as texting/browsing social media, relaxing, and working, suggesting that AVs may offer passengers a more productive use of their travel time. Furthermore, the LTA model shows that there is a good portion of individuals who were performing one or two activities in a traditional vehicle now becoming variety seekers that could perform at least four different activities in an AV, further corroborating the findings that AVs could provide a more productive and efficient use of travel time.
- Research Article
112
- 10.1007/s10661-019-7330-6
- Mar 5, 2019
- Environmental Monitoring and Assessment
Spatio-temporal land-use change modeling, simulation, and prediction have become one of the critical issues in the last three decades due to uncertainty, structure, flexibility, accuracy, the ability for improvement, and the capability for integration of available models. Therefore, many types of models such as dynamic, statistical, and machine learning (ML) models have been used in the geographic information system (GIS) environment to fulfill the high-performance requirements of land-use modeling. This paper provides a literature review on models for modeling, simulating, and predicting land-use change to determine the best approach that can realistically simulate land-use changes. Therefore, the general characteristics of conventional and ML models for land-use change are described, and the different techniques used in the design of these models are classified. The strengths and weaknesses of the various dynamic, statistical, and ML models are determined according to the analysis and discussion of the characteristics of these models. The results of the review confirm that ML models are the most powerful models for simulating land-use change because they can include all driving forces of land-use change in the simulation process and simulate linear and non-linear phenomena, which dynamic models and statistical models are unable to do. However, ML models also have limitations. For instance, some ML models are complex, the simulation rules cannot be changed, and it is difficult to understand how ML models work in a system. However, this can be solved via the use of programming languages such as Python, which in turn improve the simulation capabilities of the ML models.
- Research Article
2
- 10.15407/jai2022.02.092
- Dec 29, 2022
- Artificial Intelligence
In this paper we discuss the on-going joint work contributing to the IIASA (International Institute for Applied Systems Analysis, Laxenburg, Austria) and National Academy of Science of Ukraine projects on “Modeling and management of dynamic stochastic interdependent systems for food-water-energy-health security nexus” (see [1-2] and references therein). The project develops methodological and modeling tools aiming to create Intelligent multimodel Decision Support System (IDSS) and Platform (IDSP), which can integrate national Food, Water, Energy, Social models with the models operating at the global scale (e.g., IIASA GLOBIOM and MESSAGE), in some cases ‘downscaling’ the results of the latter to a national level. Data harmonization procedures rely on new type non-smooth stochastic optimization and stochastic quasigradient (SQG) [3-4] methods for robust of-line and on-line decisions involving large-scale machine learning and Artificial Intelligence (AI) problems in particular, Deep Learning (DL) including deep neural learning or deep artificial neural network (ANN). Among the methodological aims of the project is the development of “Models’ Linkage” algorithms which are in the core of the IDSS as they enable distributed models’ linkage and data integration into one system on a platform [5-8]. The linkage algorithms solve the problem of linking distributed models, e.g., sectorial and/or regional, into an inter-sectorial inter-regional integrated models. The linkage problem can be viewed as a general endogenous reinforced learning problem of how software agents (models) take decisions in order to maximize the “cumulative reward". Based on novel ideas of systems’ linkage under asymmetric information and other uncertainties, nested strategic-operational and local-global models are being developed and used in combination with, in general, non-Bayesian probabilistic downscaling procedures. In this paper we illustrate the importance of the iterative “learning” solution algorithms based on stochastic quasigradient (SQG) procedures for robust of-line and on-line decisions involving large-scale Machine Learning, Big Data analysis, Distributed Models Linkage, and robust decision-making problems. Advanced robust statistical analysis and machine learning models of, in general, nonstationary stochastic optimization allow to account for potential distributional shifts, heavy tails, and nonstationarities in data streams that can mislead traditional statistical and machine learning models, in particular, deep neural learning or deep artificial neural network (ANN). Proposed models and methods rely on probabilistic and non-probabilistic (explicitly given or simulated) distributions combining measures of chances, experts’ beliefs and similarity measures (for example, compressed form of the kernel estimators). For highly nonconvex models such as the deep ANN network, the SQGs allow to avoid local solutions. In cases of nonstationary data, the SQGs allow for sequential revisions and adaptation of parameters to the changing environment, possibly, based on of-line adaptive simulations. The non-smooth STO approaches and SQG-based iterative solution procedures are illustrated with examples of robust estimation, models’ linkage, machine learning, adaptive Monte Carlo optimization for cat risks (floods, earthquakes, etc.) modeling and management
- Research Article
1
- 10.1167/tvst.13.8.12
- Aug 8, 2024
- Translational vision science & technology
Compare the use of optic disc and macular optical coherence tomography measurements to predict glaucomatous visual field (VF) worsening. Machine learning and statistical models were trained on 924 eyes (924 patients) with circumpapillary retinal nerve fiber layer (cp-RNFL) or ganglion cell inner plexiform layer (GC-IPL) thickness measurements. The probability of 24-2 VF worsening was predicted using both trend-based and event-based progression definitions of VF worsening. Additionally, the cp-RNFL and GC-IPL predictions were combined to produce a combined prediction. A held-out test set of 617 eyes was used to calculate the area under the curve (AUC) to compare cp-RNFL, GC-IPL, and combined predictions. The AUCs for cp-RNFL, GC-IPL, and combined predictions with the statistical and machine learning models were 0.72, 0.69, 0.73, and 0.78, 0.75, 0.81, respectively, when using trend-based analysis as ground truth. The differences in performance between the cp-RNFL, GC-IPL, and combined predictions were not statistically significant. AUCs were highest in glaucoma suspects using cp-RNFL predictions and highest in moderate/advanced glaucoma using GC-IPL predictions. The AUCs for the statistical and machine learning models were 0.63, 0.68, 0.69, and 0.72, 0.69, 0.73, respectively, when using event-based analysis. AUCs decreased with increasing disease severity for all predictions. cp-RNFL and GC-IPL similarly predicted VF worsening overall, but cp-RNFL performed best in early glaucoma stages and GC-IPL in later stages. Combining both did not enhance detection significantly. cp-RNFL best predicted trend-based 24-2 VF progression in early-stage disease, while GC-IPL best predicted progression in late-stage disease. Combining both features led to minimal improvement in predicting progression.
- Research Article
- 10.7759/cureus.91318
- Aug 1, 2025
- Cureus
Background and objectivesIn the past twenty years, several large-scale coronavirus outbreaks have caused heavy loss of life and serious economic damage worldwide. Current global surveillance suggests that similar epidemics may occur again, making timely and accurate forecasting an urgent priority. Yet, many existing prediction methods, mainly based on traditional statistical or machine learning techniques, still struggle to deliver both speed and precision. This study explores a generative artificial intelligence-driven approach aimed at narrowing these gaps.MethodsNine models (three statistical models, three machine learning models, and three generative artificial intelligence models) were compared using weekly COVID-19 case and death data from the United States (US), the United Kingdom (UK), Germany (GE), and Russia (RU) from March 15, 2020, to April 15, 2023. The statistical models used are simple moving average (SMA), simple exponential smoothing (SES), and the Holt linear trend model (Holt). The machine learning models used are k-nearest neighbor regression (KNN), regression tree (RTree), and multilayer perceptron (MLP). The generative AI models used are ChatGPT, DeepSeek (DS), and Kimi. A custom MATLAB program was used to solve the statistical and machine learning models, and the zero-inference forecasting method was used to solve the generative AI model. According to the stepwise prediction theory, error metrics for one-, two-, and three-step forecasts were calculated: mean absolute percentage error (MAPE), mean absolute error (MAE), and root mean square error (RMSE). The forecasting performance of each model was compared by comparing the one-, two-, and three-step predicting error metrics.ResultsIn our analysis, generative AI models consistently delivered the most accurate forecasts. Kimi, in particular, recorded the smallest errors for death predictions and among the lowest for new cases, while DS and ChatGPT also performed well, clearly surpassing the statistical and machine learning approaches in short-term COVID-19 forecasting.ConclusionThe results of this study demonstrate that generative AI models demonstrate superior predictive accuracy and robustness in epidemic forecasting compared to traditional statistical and machine learning models. This research is innovative in its application of generative AI technology to public health decision-making, demonstrating its robust epidemic forecasting capabilities. Given these proven advantages, public health authorities can integrate generative AI technology into major infectious disease surveillance systems, promote public health data sharing mechanisms, and incorporate generative AI into epidemic intervention and resource allocation. The implementation of these measures will enable governments and regulatory agencies worldwide to use generative AI to enhance early warning capabilities and improve their response to future infectious disease epidemics.
- Conference Article
- 10.3390/ecws-4-06441
- Nov 12, 2019
Pipe failures in Water Distribution Networks (WDN) may cause economic, environmental and social costs. The application of statistical and Machine Learning (ML) models play a critical role in planning and decision support processes for WDN management. Failure models can provide valuable information for prioritizing the system rehabilitation even in data scarcity scenarios (such as developing countries). This study compares several statistical and ML pipe failure models thus providing useful information to practitioners to select a suitable model according to their needs. Three statistical models (i.e. Linear, Poisson and Evolutionary Polynomial Regressions) were used for pipe failures prediction based on diameter, age of pipes and length as explanatory variables. The K-means clustering approach was applied to improve the performance of the statistical models. The performance indicators used were the coefficient of determination (R2) and the root mean square error (RMSE). ML approaches - namely Gradient Boosted Tree (GBT), Bayes, Support Vector Machine and Artificial Neuronal Networks (ANNs) - were compared in predicting individual pipe failure rates. The pipe’s attributes, environmental and operational variables were included as input variables. Their performance was evaluated using confusion matrices and receiver operating characteristic curves. The proposed approach was applied to a WDN in Bogotá (Colombia). The results showed that the cluster-based prediction model reduces the prediction error of pipe failures. All the models demonstrated acceptable results in terms of their performance (R2 between 0.695-0.927 and RMSE between 45-22 for the test sample). Regarding ML models, all methods but the ANNs show acceptable performance. The GBT approach has the best performing classifier (79.41% correct predictions in the test sample). This model was used to calculate the failure rate of individual pipes for rehabilitation planning. Furthermore, a sensitivity analysis of the GBT model to the input variables was performed to provide information on its generalization capability.
- Research Article
9
- 10.1016/j.geoen.2023.212086
- Jul 8, 2023
- Geoenergy Science and Engineering
Machine learning approaches for formation matrix volume prediction from well logs: Insights and lessons learned
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.