- Research Article
- 10.1145/3779127
- Dec 15, 2025
- ACM Transactions on Interactive Intelligent Systems
- Yunyao Li + 2 more
Generative AI increasingly reshapes how people engage with interactive systems. It now plays a vital role in designing, studying, and refining human-centered methods that let individuals interact and collaborate with AI, strengthening their agency and control. This special issue highlights the human role in Generative AI and seeks approaches that equip diverse stakeholders across socio-technical contexts to understand, direct, and steer these systems while enabling responsible innovation. We publish in this special issue original research on new interaction techniques that integrate human input into Generative AI’s continual development, studies of interaction paradigms that support more effective human–AI collaboration, and work that deepens understanding of model capabilities. Thus, we aim to build a research community around Human-Centric GenAI that empowers people to actively shape systems in line with their values, needs, and expectations.
- Research Article
1
- 10.1145/3744750
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Matt-Heun Hong + 1 more
We investigate the potential of leveraging the code-generating capabilities of Large Language Models (LLMs) to support exploratory visual analysis (EVA) via conversational user interfaces (CUIs). We developed a technology probe that was deployed through two studies with a total of 50 data workers to explore the structure and flow of visual analytic conversations during EVA. We analyzed conversations from both studies using thematic analysis and derived a state transition diagram summarizing the conversational flow between four states of participant utterances ( Analytic Tasks , Editing Operations , Elaborations and Enrichments , and Directive Commands ) and two states of Generative AI (GenAI) agent responses (visualization, text). We describe the capabilities and limitations of GenAI agents according to each state and transitions between states as three co-occurring loops: analysis elaboration, refinement, and explanation. We discuss our findings as future research trajectories to improve the experiences of data workers using GenAI. The code and data are available at https://osf.io/6wxpa .
- Research Article
2
- 10.1145/3756326
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Angela Mastrianni + 8 more
Generative AI has the potential to transform knowledge work, but further research is needed to understand how knowledge workers envision using and interacting with generative AI. We investigate the development of generative AI tools to support domain experts in knowledge work, examining task delegation and the design of human–AI interactions. Our research focused on designing a generative AI assistant to aid genetic professionals in analyzing whole genome sequences (WGS) and other clinical data for rare disease diagnosis. Through interviews with 17 genetics professionals, we identified current challenges in WGS analysis. We then conducted co-design sessions with six genetics professionals to determine tasks that could be supported by an AI assistant and considerations for designing interactions with the AI assistant. From our findings, we identified sensemaking as both a current challenge in WGS analysis and a process that could be supported by AI. We contribute an understanding of how domain experts envision interacting with generative AI in their knowledge work, a detailed empirical study of WGS analysis, and three design considerations for using generative AI to support domain experts in sensemaking during knowledge work.
- Research Article
1
- 10.1145/3774752
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Max Fowler + 5 more
Code-reading ability has traditionally been under-emphasized in assessments as it is difficult to assess at scale. Prior research has shown that code-reading and code-writing are closely related skills; thus being able to assess and train code reading skills may be necessary for student learning. One way to assess code-reading ability is using Explain in Plain English (EiPE) questions, which ask students to describe what a piece of code does with natural language. Previous research deployed a binary (correct/incorrect) autograder using bigram models that performed comparably with human teaching assistants on student responses. With a dataset of 3,064 student responses from 17 EiPE questions, we investigated multiple autograders for EiPE questions. We evaluated methods as simple as logistic regression trained on bigram features, to more complicated Support Vector Machines (SVMs) trained on embeddings from Large Language Models (LLMs) to GPT-4. We found multiple useful autograders, most with accuracies in the \(86\!\!-\!\!88\%\) range, with different advantages. SVMs trained on LLM embeddings had the highest accuracy; few-shot chat completion with GPT-4 required minimal human effort; pipelines with multiple autograders for specific dimensions (what we call 3D autograders) can provide fine-grained feedback; and code generation with GPT-4 to leverage automatic code testing as a grading mechanism in exchange for slightly more lenient grading standards. While piloting these autograders in a non-major introductory Python course, students had largely similar views of all autograders, although they more often found the GPT-based grader and code-generation graders more helpful and liked the code-generation grader the most.
- Research Article
1
- 10.1145/3768340
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Anna Bavaresco + 2 more
Improving the modeling of human representations of everyday semantic categories, such as animals or food, can lead to better alignment between AI systems and humans. Humans are thought to represent such categories using dimensions that capture relevant variance, in this way defining the relationship between category members. In AI systems, the representational space for a category is defined by the distances between its members. Importantly, in this context, the same features are used for distance computations across all categories. In two experiments, we show that pruning a model’s feature space to better align with human representations of a category selects for different model features and different subspaces for different categories. In addition, we provide a proof of concept demonstrating the relevance of these findings for evaluating the quality of images generated by AI systems.
- Research Article
2
- 10.1145/3769072
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Jeba Rezwana + 1 more
As AI becomes increasingly prevalent in creative domains, it is imperative to understand users’ mental models of AI in human–AI co-creation as mental models shape user experiences. Additionally, gaining insights into users’ mental models is essential for the development of human-centered co-creative AI. This article introduces a framework for exploring users’ mental models of co-creative AI. Using a large-scale study (n = 155), we explore mental models of two existing AI systems, ChatGPT and Stable Diffusion, in co-creation contexts. Participants engaged in creative tasks with both AI and completed surveys, revealing insights into mental models and their associations with demographic factors and users’ ethical stances. The results highlight the major types and patterns of mental models of AI in co-creative contexts. Findings also reveal that individuals with expertise in AI typically have Partnership-oriented mental models of co-creative AI, while those lacking AI literacy tend to have more Tool-oriented mental models. Furthermore, individuals with Partnership-oriented mental models usually have a positive ethical perspective toward anthropomorphism in AI, data collection by AI, and AI’s societal impact. Additionally, results highlight that conversational co-creative AI is generally perceived as a collaborator, whereas non-conversational AI is typically viewed as a tool.
- Research Article
1
- 10.1145/3771844
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Nicole C Krämer + 4 more
Generative AI systems such as ChatGPT are increasingly used to assist with tasks or to receive information. Given that the systems are not perfectly reliable regarding the content they produce, human users need to carefully navigate to which degree they can place trust in the system (calibrated trust). However, based on media equation assumptions, it can be hypothesized that social cues displayed by the system might instill more trust than is warranted. Against this background, the present study investigates in a 2 × 2 between subject design ( N = 617) whether the social cues “typing behavior” and “personalized address” used by ChatGPT increase perceived trust (benevolence, ability, and integrity) in the system and whether this is mediated by perceived similarity and moderated by anthropomorphism inclination. The results show that the social cue “typing behavior” leads to a significant increase in trust in the dimension benevolence. Neither perceived similarity nor anthropomorphism inclination modulates this effect. However, as a side effect, perceived similarity was found to significantly predict trust in ChatGPT.
- Research Article
1
- 10.1145/3764591
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Christopher Flathmann + 3 more
With the emergence of new AI technologies, research on the potential for AI to function as teammates alongside humans has expanded. The recent introduction of highly capable large language models (LLMs) is particularly noteworthy, showing strong potential in human–AI teaming where communication is crucial. However, this novel technology has yet to be validated in human–AI teaming or as a teammate, hindering its application in research and practice. This article presents an empirical online experiment (N = 778) where participants engaged in a real-time and interdependent interaction with a commercially available LLM, with the presentation of the LLM manipulated to be either a tool or a teammate. Results show that when compared to presenting an LLM as a teammate rather than a tool significantly increases trust and significantly impacts the sentiment humans have when talking with their AI, with LLM teammates seeing more positive sentiment. Perceptions of trust, acceptance, and performance were generally high for LLMs presented as teammates. Despite these impacts, participants’ prior experiences with AI technology were still shown to predict the perceptions they formed with their AI teammate. Based on these findings, this article presents an important empirical result, which is that presenting highly capable AI, such as LLMs, as teammates can improve perception and interaction compared to presenting an AI as a tool. In turn, a discussion is had on how future research can continue to identify when and how to introduce LLMs and other AI technologies as teammates.
- Research Article
1
- 10.1145/3722227
- Dec 10, 2025
- ACM Transactions on Interactive Intelligent Systems
- Jarod Govers + 3 more
The start of the 2020s ushered in a new era of AI through the rise of Generative AI Large Language Models (LLMs) such as ChatGPT. These AI chatbots offer a form of interactive agency by enabling users to ask questions and query for more information. However, prior research only considers if LLMs have a political bias or agenda, and not how a biased LLM can impact a user’s opinion and trust. Our study bridges this gap by investigating a scenario where users read online news articles and then engage with an interactive AI chatbot, where both the news and the AI are biased to hold a particular stance on a news topic. Interestingly, participants were far more likely to adopt the narrative of a biased chatbot over news articles with an opposing stance. Participants were also substantially more inclined to adopt the chatbot’s narrative if its stance aligned with the news—all compared to a control news-article only group. Our findings suggest that the very interactive agency offered by an AI chatbot significantly enhances its perceived trust and persuasive ability compared to the ‘ static ’ articles from established news outlets, raising concerns about the potential for AI-driven indoctrination. We outline the reasons behind this phenomenon and conclude with the implications of biased LLMs for HCI research, as well as the risks of Generative AI undermining democratic integrity through AI-driven Information Warfare.
- Research Article
- 10.1145/3778167
- Nov 27, 2025
- ACM Transactions on Interactive Intelligent Systems
- Ziquan Deng + 3 more
Time series anomaly detection is a critical machine learning task for numerous applications, such as finance, healthcare, and industrial systems. However, even high-performing models may exhibit potential issues such as biases, leading to unreliable outcomes and misplaced confidence. While model explanation techniques, particularly visual explanations, offer valuable insights by elucidating model attributions of their decision, many limitations still exist—They are primarily instance-based and not scalable across the dataset, and they provide one-directional information from the model to the human side, lacking a mechanism for users to address detected issues. To fulfill these gaps, we introduce HILAD , a novel framework designed to foster a dynamic and bidirectional collaboration between humans and AI for enhancing anomaly detection models in time series. Through our visual interface, HILAD empowers domain experts to detect, interpret, and correct unexpected model behaviors at scale. Our evaluation through user studies with two models and three time series datasets demonstrates the effectiveness of HILAD, which fosters a deeper model understanding, immediate corrective actions, and model reliability enhancement.