- Research Article
- 10.1145/3715710
- Apr 9, 2025
- ACM Transactions on Interactive Intelligent Systems
- Mohammed Alhamadi + 3 more
Information presentation problems on interactive dashboards are known to hinder decision-making. Since a traditional user-centred approach to designing usable dashboards cannot fully satisfy user demands, needs and skills, we isolate behavioural indicators of usability when users conduct typical information-seeking and comparison tasks. In a first study (N = 50), we identified strategies derived from 486,435 interaction events logged in a controlled setting with synthetic dashboards. User models consisting of these user strategies and graph literacy produced strong signals indicating that usability was predictable. In a second study (N = 65), we tested the initial insights on real-world dashboards. While most of our hypotheses were confirmed, graph literacy emerged as the best predictor of usability. Usability was better predicted in dashboards with problems, suggesting promising opportunities for automated usability evaluation and real-time support for users struggling with visual analytics dashboards.
- Research Article
7
- 10.1145/3707649
- Feb 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Philipp Schoenegger + 4 more
Large language models (LLMs) match and sometimes exceed human performance in many domains. This study explores the potential of LLMs to augment human judgment in a forecasting task. We evaluate the effect on human forecasters of two LLM assistants: one designed to provide high-quality (“superforecasting”) advice, and the other designed to be overconfident and base-rate neglecting, thus providing noisy forecasting advice. We compare participants using these assistants to a control group that received a less advanced model that did not provide numerical predictions or engage in explicit discussion of predictions. Participants ( N \(=\) 991) answered a set of six forecasting questions and had the option to consult their assigned LLM assistant throughout. Our preregistered analyses show that interacting with each of our frontier LLM assistants significantly enhances prediction accuracy by between 24% and 28% compared to the control group. Exploratory analyses showed a pronounced outlier effect in one forecasting item, without which we find that the superforecasting assistant increased accuracy by 41%, compared with 29% for the noisy assistant. We further examine whether LLM forecasting augmentation disproportionately benefits less skilled forecasters, degrades the wisdom-of-the-crowd by reducing prediction diversity, or varies in effectiveness with question difficulty. Our data do not consistently support these hypotheses. Our results suggest that access to a frontier LLM assistant, even a noisy one, can be a helpful decision aid in cognitively demanding tasks compared to a less powerful model that does not provide specific forecasting advice. However, the effects of outliers suggest that further research into the robustness of this pattern is needed.
- Research Article
1
- 10.1145/3696423
- Feb 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Monika Westphal + 5 more
Recent work has proposed AI models that can learn to decide whether to make a prediction for a task instance or to delegate it to a human by considering both parties’ capabilities. In simulations with synthetically generated or context-independent human predictions, delegation can help improve the performance of human-AI teams—compared to humans or the AI model completing the task alone. However, so far, it remains unclear how humans perform and how they perceive the task when individual instances of a task are delegated to them by an AI model. In an experimental study with 196 participants, we show that task performance and task satisfaction improve for the instances delegated by the AI model, regardless of whether humans are aware of the delegation. Additionally, we identify humans’ increased levels of self-efficacy as the underlying mechanism for these improvements in performance and satisfaction, and one dimension of cognitive ability as a moderator to this effect. In particular, AI delegation can buffer potential negative effects on task performance and task satisfaction for humans with low visual processing ability. Our findings provide initial evidence that allowing AI models to take over more management responsibilities can be an effective form of human-AI collaboration in workplaces.
- Research Article
3
- 10.1145/3709012
- Feb 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Riku Arakawa + 2 more
Multimodal scene search of conversations is essential for unlocking valuable insights into social dynamics and enhancing our communication. While experts in conversational analysis have their own knowledge and skills to find key scenes, a lack of comprehensive, user-friendly tools that streamline the processing of diverse multimodal queries impedes efficiency and objectivity. To address this gap, we developed ConverSearch , a visual-programming-based tool based on insights for effective interface and implementation design derived from a formative study with experts. The tool allows experts to integrate various machine learning algorithms to capture human behavioral cues without the need for coding. Our user study, employing the System Usability Scale (SUS) and satisfaction metrics, demonstrated high user preference, reflecting the tool’s ease of use and effectiveness in supporting scene search tasks. Additionally, through a deployment trial within industrial organizations, we confirmed the tool’s objectivity, reusability, and potential to enhance expert workflows. This suggests the advantages of expert-AI collaboration in domains requiring human contextual understanding and demonstrates how customizable, transparent tools yielding reusable artifacts can support expert-driven tasks in complex, multimodal environments.
- Research Article
2
- 10.1145/3690829
- Jan 16, 2025
- ACM Transactions on Interactive Intelligent Systems
- Matthew Wilchek + 2 more
This article introduces the design and prototype of Ajna, a wearable shared perception system for supporting extreme sensemaking in emergency scenarios. Ajna addresses technical challenges in Augmented Reality (AR) devices, specifically the limitations of depth sensors and cameras. These limitations confine object detection to close proximity and hinder perception beyond immediate surroundings, through obstructions, or across different structural levels, impacting collaborative use. It harnesses the Inertial Measurement Unit (IMU) in AR devices to measure users’ relative distances from a set physical point, enabling object detection sharing among multiple users across obstacles like walls and over distances. We tested Ajna’s effectiveness in a controlled study with 15 participants simulating emergency situations in a multi-story building. We found that Ajna improved object detection, location awareness, and situational awareness and reduced search times by 15%. Ajna’s performance in simulated environments highlights the potential of artificial intelligence (AI) to enhance sensemaking in critical situations, offering insights for law enforcement, search and rescue, and infrastructure management.
- Research Article
1
- 10.1145/3700139
- Jan 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Kazuyuki Fujita + 4 more
Building higher-quality image classification models requires better performance analysis (PA) to help understand their behaviors. We propose ConfusionLens, a dynamic and interactive visualization interface that augments a conventional confusion matrix with focus+context visualization. This interface allows users to seamlessly switch table layouts among three views (overall view, class-level view, and between-class view) while observing all of the instance images in a single screen. We designed and implemented a ConfusionLens prototype that supports hundreds of instances, and then conducted a user study (N \(=\) 14) to evaluate it compared to the conventional confusion matrix with a split view of instances. Results show that ConfusionLens achieved faster task-completion time in observing instance-level performance and higher accuracy in observing between-class confusion. Moreover, we conducted an expert interview (N \(=\) 6) to investigate the applicability of our interface to practical PA tasks, and then implemented several extensions of ConfusionLens based on the feedback. Feedback on these extensions from users experienced in image classification (N \(=\) 5) demonstrated their general usefulness and highlighted their beneficial use in PA tasks.
- Research Article
- 10.1145/3691643
- Jan 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Anıl Doğru + 2 more
Autonomous negotiating agents, which can interact with other agents, aim to solve decision-making problems involving participants with conflicting interests. Designing agents capable of negotiating with human partners requires considering some factors, such as emotional states and arguments. For this purpose, we introduce an extended taxonomy of argument types capturing human speech acts during the negotiation. We propose an argument-based automated negotiating agent that can extract human arguments from a chat-based environment using a hierarchical classifier. Consequently, the proposed agent can understand the received arguments and adapt its strategy accordingly while negotiating with its human counterparts. We initially conducted human-agent negotiation experiments to construct a negotiation corpus to train our classifier. According to the experimental results, it is seen that the proposed hierarchical classifier successfully extracted the arguments from the given text. Moreover, we conducted a second experiment where we tested the performance of the designed negotiation strategy considering the human opponent’s arguments and emotions. Our results showed that the proposed agent beats the human negotiator and gains higher utility than the baseline agent.
- Research Article
12
- 10.1145/3686164
- Dec 18, 2024
- ACM Transactions on Interactive Intelligent Systems
- Patricia K Kahr + 3 more
People are increasingly interacting with AI systems, but successful interactions depend on people trusting these systems only when appropriate. Since neither gaining trust in AI advice nor restoring lost trust after AI mistakes is warranted, we seek to better understand the development of trust and reliance in sequential human-AI interaction scenarios. In a 2 \({\times}\) 2 between-subject simulated AI experiment, we tested how model accuracy (high vs. low) and explanation type (human-like vs. abstract) affect trust and reliance on AI advice for repeated interactions. In the experiment, participants estimated jail times for 20 criminal law cases, first without and then with AI advice. Our results show that trust and reliance are significantly higher for high model accuracy. In addition, reliance does not decline over the trial sequence, and trust increases significantly with high accuracy. Human-like (vs. abstract) explanations only increased reliance on the high-accuracy condition. We furthermore tested the extent to which trust and reliance in a trial round can be explained by trust and reliance experiences from prior rounds. We find that trust assessments in prior trials correlate with trust in subsequent ones. We also find that the cumulative trust experience of a person in all earlier trial rounds correlates with trust in subsequent ones. Furthermore, we find that the two trust measures, trust and reliance, impact each other: prior trust beliefs not only influence subsequent trust beliefs but likewise influence subsequent reliance behavior, and vice versa. Executing a replication study yielded comparable results to our original study, thereby enhancing the validity of our findings.
- Research Article
3
- 10.1145/3689649
- Dec 18, 2024
- ACM Transactions on Interactive Intelligent Systems
- Jiwon Kim + 6 more
Sketch-based drawing assessments in art therapy are widely used to understand individuals’ cognitive and psychological states, such as cognitive impairments or mental disorders. Along with self-reported measures based on questionnaires, psychological drawing assessments can augment information regarding an individual’s psychological state. Interpreting drawing assessments demands significant time and effort, particularly for large groups such as schools or companies, and relies on the expertise of art therapists. To address this issue, we propose an artificial intelligence (AI)-based expert support system called AlphaDAPR to support art therapists and psychologists in conducting large-scale automatic drawing assessments. In Study 1, we first investigated user experience in AlphaDAPR . Through surveys involving 64 art therapists, we observed a substantial willingness (64.06% of participants) in using the proposed system. Structural equation modeling highlighted the pivotal role of explainable AI in the interface design, affecting perceived usefulness, trust, satisfaction, and intention to use. However, our interviews unveiled a nuanced perspective: while many art therapists showed a strong inclination to use the proposed system, they also voiced concerns about potential AI limitations and risks. Since most concerns arose from insufficient trust, which was the focal point of our attention, we conducted Study 2 with the aim of enhancing trust. Study 2 delved deeper into the necessity of clear communication regarding the division of roles between AI and users for elevating trust. Through experimentation with another 26 art therapists, we demonstrated that clear communication enhances users’ trust in our system. Our work not only highlights the potential of AlphaDAPR to streamline drawing assessments but also underscores broader implications for human-AI collaboration in psychological domains. By addressing concerns and optimizing communication, we pave the way for a symbiotic relationship between AI and human expertise, ultimately enhancing the efficacy and accessibility of psychological assessment tools.
- Research Article
3
- 10.1145/3650114
- Dec 17, 2024
- ACM Transactions on Interactive Intelligent Systems
- Zheng Ning + 4 more
Querying structured databases with natural language (NL2SQL) has remained a difficult problem for years. Recently, the advancement of machine learning (ML), natural language processing (NLP), and large language models (LLM) have led to significant improvements in performance, with the best model achieving ∼85% percent accuracy on the benchmark Spider dataset. However, there is a lack of a systematic understanding of the types, causes, and effectiveness of error-handling mechanisms of errors for erroneous queries nowadays. To bridge the gap, a taxonomy of errors made by four representative NL2SQL models was built in this work, along with an in-depth analysis of the errors. Second, the causes of model errors were explored by analyzing the model-human attention alignment to the natural language query. Last, a within-subjects user study with 26 participants was conducted to investigate the effectiveness of three interactive error-handling mechanisms in NL2SQL. Findings from this article shed light on the design of model structure and error discovery and repair strategies for natural language data query interfaces in the future.