NAPS Fusion: A framework to overcome experimental data limitations to predict human performance and cognitive task outcomes

Nicholas J Napoli,Chad L Stephens,Kellie D Kennedy,Laura E Barnes,Ezequiel Juarez Garcia,Angela R Harrivel

doi:10.1016/j.inffus.2022.09.016

Abstract

In the area of human performance and cognitive research, machine learning (ML) problems become increasingly complex due to limitations in the experimental design, resulting in the development of poor predictive models. More specifically, experimental study designs produce very few data instances, have large class imbalances and conflicting ground truth labels, and generate wide data sets due to the diverse amount of sensors. From an ML perspective these problems are further exacerbated in anomaly detection cases where class imbalances occur and there are almost always more features than samples. Typically, dimensionality reduction methods (e.g., PCA, autoencoders) are utilized to handle these issues from wide data sets. However, these dimensionality reduction methods do not always map to a lower dimensional space appropriately, and they capture noise or irrelevant information. In addition, when new sensor modalities are incorporated, the entire ML paradigm has to be remodeled because of new dependencies introduced by the new information. Remodeling these ML paradigms is time-consuming and costly due to lack of modularity in the paradigm design, which is not ideal. Furthermore, human performance research experiments, at times, creates ambiguous class labels because the ground truth data cannot be agreed upon by subject-matter experts annotations, making ML paradigm nearly impossible to model.This work pulls insights from Dempster–Shafer theory (DST), stacking of ML models, and bagging to address uncertainty and ignorance for multi-classification ML problems caused by ambiguous ground truth, low samples, subject-to-subject variability, class imbalances, and wide data sets. Based on these insights, we propose a probabilistic model fusion approach, Naive Adaptive Probabilistic Sensor (NAPS), which combines ML paradigms built around bagging algorithms to overcome these experimental data concerns while maintaining a modular design for future sensor (new feature integration) and conflicting ground truth data. We demonstrate significant overall performance improvements using NAPS (an accuracy of 95.29%) in detecting human task errors (a four class problem) caused by impaired cognitive states and a negligible drop in performance with the case of ambiguous ground truth labels (an accuracy of 93.93%), when compared to other methodologies (an accuracy of 64.91%). This work potentially sets the foundation for other human-centric modeling systems that rely on human state prediction modeling.

Full Text