- Research Article
- 10.1145/3779127
- Dec 15, 2025
- ACM Transactions on Interactive Intelligent Systems
- Yunyao Li + 2 more
Generative AI increasingly reshapes how people engage with interactive systems. It now plays a vital role in designing, studying, and refining human-centered methods that let individuals interact and collaborate with AI, strengthening their agency and control. This special issue highlights the human role in Generative AI and seeks approaches that equip diverse stakeholders across socio-technical contexts to understand, direct, and steer these systems while enabling responsible innovation. We publish in this special issue original research on new interaction techniques that integrate human input into Generative AI’s continual development, studies of interaction paradigms that support more effective human–AI collaboration, and work that deepens understanding of model capabilities. Thus, we aim to build a research community around Human-Centric GenAI that empowers people to actively shape systems in line with their values, needs, and expectations.
- Research Article
- 10.1145/3744750
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Matt-Heun Hong + 1 more
We investigate the potential of leveraging the code-generating capabilities of Large Language Models (LLMs) to support exploratory visual analysis (EVA) via conversational user interfaces (CUIs). We developed a technology probe that was deployed through two studies with a total of 50 data workers to explore the structure and flow of visual analytic conversations during EVA. We analyzed conversations from both studies using thematic analysis and derived a state transition diagram summarizing the conversational flow between four states of participant utterances ( Analytic Tasks , Editing Operations , Elaborations and Enrichments , and Directive Commands ) and two states of Generative AI (GenAI) agent responses (visualization, text). We describe the capabilities and limitations of GenAI agents according to each state and transitions between states as three co-occurring loops: analysis elaboration, refinement, and explanation. We discuss our findings as future research trajectories to improve the experiences of data workers using GenAI. The code and data are available at https://osf.io/6wxpa .
- Research Article
- 10.1145/3774752
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Max Fowler + 5 more
Code-reading ability has traditionally been under-emphasized in assessments as it is difficult to assess at scale. Prior research has shown that code-reading and code-writing are closely related skills; thus being able to assess and train code reading skills may be necessary for student learning. One way to assess code-reading ability is using Explain in Plain English (EiPE) questions, which ask students to describe what a piece of code does with natural language. Previous research deployed a binary (correct/incorrect) autograder using bigram models that performed comparably with human teaching assistants on student responses. With a dataset of 3,064 student responses from 17 EiPE questions, we investigated multiple autograders for EiPE questions. We evaluated methods as simple as logistic regression trained on bigram features, to more complicated Support Vector Machines (SVMs) trained on embeddings from Large Language Models (LLMs) to GPT-4. We found multiple useful autograders, most with accuracies in the \(86\!\!-\!\!88\%\) range, with different advantages. SVMs trained on LLM embeddings had the highest accuracy; few-shot chat completion with GPT-4 required minimal human effort; pipelines with multiple autograders for specific dimensions (what we call 3D autograders) can provide fine-grained feedback; and code generation with GPT-4 to leverage automatic code testing as a grading mechanism in exchange for slightly more lenient grading standards. While piloting these autograders in a non-major introductory Python course, students had largely similar views of all autograders, although they more often found the GPT-based grader and code-generation graders more helpful and liked the code-generation grader the most.
- Research Article
- 10.1145/3768340
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Anna Bavaresco + 2 more
Improving the modeling of human representations of everyday semantic categories, such as animals or food, can lead to better alignment between AI systems and humans. Humans are thought to represent such categories using dimensions that capture relevant variance, in this way defining the relationship between category members. In AI systems, the representational space for a category is defined by the distances between its members. Importantly, in this context, the same features are used for distance computations across all categories. In two experiments, we show that pruning a model’s feature space to better align with human representations of a category selects for different model features and different subspaces for different categories. In addition, we provide a proof of concept demonstrating the relevance of these findings for evaluating the quality of images generated by AI systems.
- Research Article
- 10.1145/3771844
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Nicole C Krämer + 4 more
Generative AI systems such as ChatGPT are increasingly used to assist with tasks or to receive information. Given that the systems are not perfectly reliable regarding the content they produce, human users need to carefully navigate to which degree they can place trust in the system (calibrated trust). However, based on media equation assumptions, it can be hypothesized that social cues displayed by the system might instill more trust than is warranted. Against this background, the present study investigates in a 2 × 2 between subject design ( N = 617) whether the social cues “typing behavior” and “personalized address” used by ChatGPT increase perceived trust (benevolence, ability, and integrity) in the system and whether this is mediated by perceived similarity and moderated by anthropomorphism inclination. The results show that the social cue “typing behavior” leads to a significant increase in trust in the dimension benevolence. Neither perceived similarity nor anthropomorphism inclination modulates this effect. However, as a side effect, perceived similarity was found to significantly predict trust in ChatGPT.
- Research Article
- 10.1145/3769072
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Jeba Rezwana + 1 more
As AI becomes increasingly prevalent in creative domains, it is imperative to understand users’ mental models of AI in human–AI co-creation as mental models shape user experiences. Additionally, gaining insights into users’ mental models is essential for the development of human-centered co-creative AI. This article introduces a framework for exploring users’ mental models of co-creative AI. Using a large-scale study (n = 155), we explore mental models of two existing AI systems, ChatGPT and Stable Diffusion, in co-creation contexts. Participants engaged in creative tasks with both AI and completed surveys, revealing insights into mental models and their associations with demographic factors and users’ ethical stances. The results highlight the major types and patterns of mental models of AI in co-creative contexts. Findings also reveal that individuals with expertise in AI typically have Partnership-oriented mental models of co-creative AI, while those lacking AI literacy tend to have more Tool-oriented mental models. Furthermore, individuals with Partnership-oriented mental models usually have a positive ethical perspective toward anthropomorphism in AI, data collection by AI, and AI’s societal impact. Additionally, results highlight that conversational co-creative AI is generally perceived as a collaborator, whereas non-conversational AI is typically viewed as a tool.
- Research Article
- 10.1145/3764591
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Christopher Flathmann + 3 more
With the emergence of new AI technologies, research on the potential for AI to function as teammates alongside humans has expanded. The recent introduction of highly capable large language models (LLMs) is particularly noteworthy, showing strong potential in human–AI teaming where communication is crucial. However, this novel technology has yet to be validated in human–AI teaming or as a teammate, hindering its application in research and practice. This article presents an empirical online experiment (N = 778) where participants engaged in a real-time and interdependent interaction with a commercially available LLM, with the presentation of the LLM manipulated to be either a tool or a teammate. Results show that when compared to presenting an LLM as a teammate rather than a tool significantly increases trust and significantly impacts the sentiment humans have when talking with their AI, with LLM teammates seeing more positive sentiment. Perceptions of trust, acceptance, and performance were generally high for LLMs presented as teammates. Despite these impacts, participants’ prior experiences with AI technology were still shown to predict the perceptions they formed with their AI teammate. Based on these findings, this article presents an important empirical result, which is that presenting highly capable AI, such as LLMs, as teammates can improve perception and interaction compared to presenting an AI as a tool. In turn, a discussion is had on how future research can continue to identify when and how to introduce LLMs and other AI technologies as teammates.
- Research Article
- 10.1145/3778167
- Nov 27, 2025
- ACM Transactions on Interactive Intelligent Systems
- Ziquan Deng + 3 more
Time series anomaly detection is a critical machine learning task for numerous applications, such as finance, healthcare, and industrial systems. However, even high-performing models may exhibit potential issues such as biases, leading to unreliable outcomes and misplaced confidence. While model explanation techniques, particularly visual explanations, offer valuable insights by elucidating model attributions of their decision, many limitations still exist—They are primarily instance-based and not scalable across the dataset, and they provide one-directional information from the model to the human side, lacking a mechanism for users to address detected issues. To fulfill these gaps, we introduce HILAD , a novel framework designed to foster a dynamic and bidirectional collaboration between humans and AI for enhancing anomaly detection models in time series. Through our visual interface, HILAD empowers domain experts to detect, interpret, and correct unexpected model behaviors at scale. Our evaluation through user studies with two models and three time series datasets demonstrates the effectiveness of HILAD, which fosters a deeper model understanding, immediate corrective actions, and model reliability enhancement.
- Research Article
- 10.1145/3774779
- Nov 5, 2025
- ACM Transactions on Interactive Intelligent Systems
- Elizabeth A Schlesener + 4 more
LLM-based conversational agents have become increasingly popular in recent years due to their novel capacity for natural, human-like dialogue interactions. However, mistrust in LLMs persists due to concerns about privacy, the potential for incorrect responses (often referred to as ’hallucinations’), and issues related to social bias. Previous AI research shows that anthropomorphic form positively influences users’ perceptions. However, this aspect remains under-explored in LLM-based conversational agent research. Our research features two anthropomorphic forms: embodied and behavioral. Embodied Anthropomorphic Form (EA) encompasses chatbot, chatbot with text-to-speech (TTS), and embodied conversational agent (ECA) interface designs. Behavioral Anthropomorphic Form (BA) involves LLMs instructed with and without Theory of Mind (ToM) principles. In an empirical evaluation, we explored how interplay between BA form and EA form, and vice-versa, affects users’ perceptions of LLM-based conversational agents on trust, anthropomorphism, presence, usability, and user experience. Our findings provide evidence of such effects, offering novel insight into the influence of both anthropomorphic forms on perceived anthropomorphism, presence, usability, user experience, and their positive impact on user trust in LLM-based conversational agents. However, the combined highest (i.e., ECA with ToM behaviors ) and lowest (i.e., Chatbot without ToM behaviors ) levels of both forms result in lower user trust, suggesting a complex relationship between embodiment and ToM behaviors that warrants further investigation.
- Research Article
- 10.1145/3774657
- Nov 3, 2025
- ACM Transactions on Interactive Intelligent Systems
- Thomas Langerak + 4 more
Explainable AI (XAI) offers solutions to the challenges of predictability and interpretability in adaptive interfaces, particularly in Augmented Reality (AR) systems that dynamically adapt information based on situational contexts. While traditional XAI methods highlight contextual factors influencing adaptations, they often overlook the user's internal understanding, such as their expertise and contextual perceptions. This omission can result in explanations that feel redundant or obvious. We present XAIUI, a computational approach that generates tailored explanations by integrating the system's adaptation model with a bayesian model of the user's internal representation. Two online studies evaluated XAIUI. In the first study (N=77), participants ranked XAIUI 's explanations as most preferred compared to four ablations ( \(\chi^{2}(4)=62.28,p<.001\) ). In the second study (N=110), XAIUI 's explanations were rated significantly less complex ( \(\chi^{2}(4)=840.855,p<.001\) ) than all ablations, except showing no explanation. Our results demonstrate XAIUI 's ability to deliver user-centric, concise, and intuitive explanations, highlighting its potential to enhance AI-driven interfaces.