Identifying the Most Significant Features for Stress Prediction of Automobile Drivers: A Comprehensive Study

May Y Al-Nashashibi,Wael Hadi,Nuha El-Khalili,Ghassan Issa,Abedal-Kareem Al-Banna

doi:10.1142/s0219649223500648

Abstract

Objective: This paper used three feature selection methods on a Jordanian automobile drivers’ dataset to identify the most significant features for stress prediction algorithm performance. The dataset contains “stress” and “no-stress” classes with 30 features, categorised into physiological and contextual subsets. Methods: Eighteen classifiers from six prediction algorithm categories were evaluated: Rule-based, Tree-based, Ensemble-based, Function-based, Naïve Bayes-based and Lazy-based. Three Feature Subset Selection (FSS) methods were used: Gain Ratio, Chi-square and feature separation. Eight evaluation measures included [Formula: see text]1, Accuracy, Specificity, Sensitivity, Kappa Statistics, Mean Absolute Error (MAE), Area Under Curve (AUC) and Precision Recall Curve Area (PRCA). Results: Among the classifiers, Lazy-based LocalKNN performed significantly well in [Formula: see text]1, Accuracy, Kappa and MAE. Naïve Bayes-based Bayesian Network excelled in other measures. The original dataset with all features yielded the best overall performance, followed by the physiological-only subset. Gain Ratio and Chi-square FSS methods also showed promising results, though not significant. Conclusion: Four physiological (EMG, EMG Amplitude, Heart rate, Respiration Amplitude) and seven contextual (time range of driving, gender, age, driving skills, general accidents, last year’s accidents, stress frequency) features contributed to the best prediction outcomes. The study highlights the importance of proper feature selection and identifies optimal algorithms for specific measures.

Full Text