- New
- Research Article
- 10.1145/3776541
- Jan 20, 2026
- ACM Transactions on Intelligent Systems and Technology
- Karolina Seweryn + 2 more
Analyzing action scenes in soccer is a challenging task due to the complex and dynamic nature of the game, as well as the interactions between players. This article provides a comprehensive overview of this task, divided into action recognition, spotting key moments, and identifying actions in both time and space (spatio-temporal action localization) in soccer. We explore publicly available data sources and metrics used to evaluate models’ performance. The article reviews recent state-of-the-art methods that leverage deep learning techniques and traditional approaches. Our analysis begins with methods based on feature engineering, followed by an exploration of various deep learning techniques. This includes using Convolutional Neural Networks (CNNs) for visual information processing, Recurrent Neural Networks (RNNs) for analyzing temporal sequences, and transformer architectures to effectively capture context. In particular, we focus on the specifics of multimodal data, illustrating the potential for improved model accuracy and robustness. This includes an exploration of methods that integrate information from multiple sources, such as video and audio data, and methods that represent a single data source through multiple analytical lenses, offering a richer, more nuanced understanding of soccer actions (e.g., using a graph representation of players). Finally, the article highlights some of the open research questions and future directions in the field of soccer action analysis, especially the potential for multimodal methods to advance this field. Overall, this survey provides a valuable resource for researchers interested in the field of analyzing action scenes in soccer.
- New
- Research Article
- 10.1145/3787464
- Jan 19, 2026
- ACM Transactions on Intelligent Systems and Technology
- Changchang Yin + 3 more
Individual treatment effect (ITE) estimation is to evaluate the causal effects of treatment strategies on some important outcomes, which is a crucial problem in healthcare. Most existing ITE estimation methods are designed for centralized settings. However, in real-world clinical scenarios, the raw data are usually not shareable among hospitals due to the potential privacy and security risks, which makes the methods not applicable. In this work, we study the ITE estimation task in a federated setting, which allows us to harness the decentralized data from multiple hospitals. Due to the unavoidable confounding bias in the collected data, a model directly learned from it would be inaccurate. One well-known solution is Inverse Probability Treatment Weighting (IPTW), which uses the conditional probability of treatment given the covariates to re-weight each training example. Applying IPTW in a federated setting, however, is non-trivial. We found that even with a well-estimated conditional probability, the local model training step using each hospital's data alone would still suffer from confounding bias. To address this, we propose FED-IPTW, a novel algorithm to extend IPTW into a federated setting that enforces both global (over all the data) and local (within each hospital) decorrelation between covariates and treatments. We validated our approach on the task of comparing the treatment effects of mechanical ventilation on improving survival probability for patients with breadth difficulties in the intensive care unit (ICU). We conducted experiments on both synthetic and real-world eICU datasets and the results show that FED-IPTW outperform state-of-the-art methods on all the metrics on factual prediction and ITE estimation tasks, paving the way for personalized treatment strategy design in mechanical ventilation usage.
- New
- Research Article
- 10.1145/3787465
- Jan 19, 2026
- ACM Transactions on Intelligent Systems and Technology
- Sheema Madhusudhanan + 1 more
The increasing adoption of Digital Twins (DT) driven by the Internet of Things (IoT) in critical domains such as healthcare, smart energy, and mobility introduces unprecedented privacy risks due to continuous data collection, contextual sensitivity, and user traceability. We propose PRivacy in DT with minimum privacy budget ( \(\varepsilon\) ), PRiD \(\varepsilon\) , a context- and pattern-aware, genetically optimized adaptive Differential Privacy (DP) model to secure DT through a modular four-layer architecture. PRiD \(\varepsilon\) combines contextual sensitivity estimation, domain-specific heuristics, and genetic noise injection to achieve adaptive per-pattern DP guarantees. It integrates Federated Learning (FL), dynamically tuning \(\varepsilon\) across local nodes based on sensitivity feedback and evolving model requirements. A privacy-sensitive access control mechanism regulates query responses by role, budget, and pattern-level risk. Evaluations across healthcare, smart energy, and mobility demonstrate high utility ( \(>\) 95%) at low \(\varepsilon\in[0.1,0.35]\) , and strong resilience against reconstruction, inference, and \(\varepsilon\) -variation exploitation attacks. PRiD \(\varepsilon\) supports scalable, privacy-preserving DT modeling, with theoretical analysis and empirical benchmarking indicating an overall worst-case complexity of \(\mathcal{O}(n\log n)\) under the proposed pipeline.
- New
- Research Article
- 10.1145/3787457
- Jan 19, 2026
- ACM Transactions on Intelligent Systems and Technology
- Kai Di + 4 more
In industrial production processes, disruptions within the industrial chain can severely affect the collaborative capabilities of production agents. A notable example occurred during the COVID-19 pandemic, when many agents faced interruption risks and were unable to participate in coordinated production. Ensuring continuity under such conditions requires migrating tasks from disrupted agents to viable alternatives. Designing effective task migration strategies, however, must account for the emergent multiplex nature of modern industrial chains. In these multiplex networked industrial chains, disruption risk in one layer can propagate to others, generating cascading failures across the system. This introduces two key challenges: (1) disruption risk creates mismatches not only between product agents and tasks but also across network layers, enlarging the problem dimensionality; and (2) simultaneous disruptions across multiple agents and layers increase the volume of tasks needing migration, greatly expanding the solution space. To address these challenges, we introduce the notion of a multiplex potential field, which captures cross-layer interdependencies and system-level dynamics in multiplex industrial chains. Building on this concept, we develop a hierarchical contextual task migration algorithm that exploits the multiplex potential field to guide both inter-layer and intra-layer task reallocations. Extensive experiments show that our approach consistently achieves superior utility, markedly improves task completion ratios, and reduces execution costs compared to benchmark algorithms. Furthermore, it attains solution quality comparable to that of the optimal CPLEX solver while requiring substantially less computation time. Finally, a case study on the FAO international food trade network demonstrates that the proposed framework is not only theoretically robust but also practically effective when deployed on large-scale real-world multiplex systems.
- New
- Research Article
- 10.1145/3787973
- Jan 19, 2026
- ACM Transactions on Intelligent Systems and Technology
- Leonard Christopher Limanjaya + 1 more
Reinforcement learning (RL) excels in fully observable environments but faces significant challenges in partially observable scenarios, modeled as Partially Observable Markov Decision Processes (POMDPs), where agents must infer hidden states from noisy observations. We propose SIGHT (Sequential Inference with Guided Hidden Trajectories), a novel framework that integrates attention-guided hidden state prediction into an actor-critic architecture. By dynamically prioritizing relevant historical information and anticipating future dynamics, SIGHT combines the adaptability of model-free RL with the predictive foresight of model-based methods, creating a hybrid “model-based-like model-free” approach. Validated across position-based and velocity-based tasks, SIGHT outperforms state-of-the-art methods, including the POMDP Baseline, Neural Ordinary Differential Equations, and Variational Recurrent Models, achieving higher returns, stability, and adaptability. Our analysis further demonstrates the impact of sequence length, RL algorithm suitability, and attention mechanisms on performance, highlighting SIGHT's potential for advancing RL in real-world partially observable environments.
- New
- Research Article
- 10.1145/3779133
- Jan 19, 2026
- ACM Transactions on Intelligent Systems and Technology
- Amarda Shehu
The dominant framing of Artificial General Intelligence (AGI) as a discrete breakthrough obscures the more urgent reality: AGI is arriving as a gradual, cumulative erosion of human verification power distributed across institutions and decision-making systems. This article reframes the AGI transition through the lens of absorption capacity; that is, the rate at which human systems can integrate, govern, and maintain meaningful oversight of increasingly autonomous AI. Drawing on empirical observations from deploying enterprise-scale generative AI in a large public university and personal experiences as a long-standing AI researcher and educator, in this article, I identify three critical asymmetries characterizing this transition: (1) governance lag, where policy cycles cannot match technological iteration speed; (2) institutional misalignment, where locally rational AI systems produce collectively irrational societal outcomes; and (3) capability inequality, where uneven access to AI amplifies structural advantage. I argue that the defining challenge is not achieving technical alignment with human values, but maintaining epistemic authority, which is the human capacity to verify, understand, and steer systems reasoning in latent spaces beyond direct audit. The article concludes that the true measure of preparedness for AGI is not computational power or algorithmic sophistication, but adaptive governance: institutional architectures capable of co-evolving with the technologies they must regulate. The frontier is not artificial superintelligence. It is collective human capacity to remain intelligible to ourselves while embedded in AI-mediated decision ecosystems.
- New
- Research Article
- 10.1145/3779427
- Jan 19, 2026
- ACM Transactions on Intelligent Systems and Technology
- Zhen Tao + 4 more
The increasing prevalence of large language models (LLMs) has significantly advanced text generation, but the human-like quality of LLM outputs presents major challenges in reliably distinguishing between human-authored and LLM-generated texts. Existing detection benchmarks are constrained by their reliance on static datasets, scenario-specific tasks (e.g., question answering and text refinement), and a primary focus on English, overlooking the diverse linguistic and operational subtleties of LLMs. To address these gaps, we propose CUDRT, a comprehensive evaluation framework and bilingual benchmark in Chinese and English, categorizing LLM activities into five key operations: Create, Update, Delete, Rewrite, and Translate. CUDRT provides extensive datasets tailored to each operation, featuring outputs from state-of-the-art LLMs to assess the reliability of LLM-generated text detectors. This framework supports scalable, reproducible experiments and enables in-depth analysis of how operational diversity, bilingual training sets, and LLM architectures influence detection performance. Our extensive experiments demonstrate the framework’s capacity to optimize detection systems and provide practical guidance for training model-based detectors, revealing that training on specific operations and outputs from certain LLMs significantly improves model-based detector generalization. By advancing robust methodologies for identifying LLM-generated texts, this work contributes to the development of intelligent systems capable of meeting real-world bilingual detection challenges. Source code and dataset are available at GitHub.
- New
- Research Article
- 10.1145/3787975
- Jan 19, 2026
- ACM Transactions on Intelligent Systems and Technology
- Tianru Xie + 4 more
The urban foundation model is critical for trajectory-based mobile applications, which require accurate synthesis of paths that adhere to spatial constraints (road networks) and contextual constraints (e.g., weather, traffic). However, existing methods predominantly rely on task-specific models, which fail to holistically capture and integrate diverse spatial patterns (e.g., connectivity) and temporal dynamics (e.g., periodicity, trends) within a cohesive framework, limiting their generalization across diverse prediction tasks. To bridge this gap, we propose AutoDiff, a diffusion-based model generating trajectories on spatial temporal graph (STG), which establishes a new paradigm for trajectory generation as a foundation model for sequential spatiotemporal data. Specifically, we disentangle complex spatiotemporal features as generalizable segment-wise time slices on road networks through autoregressive diffusion generation, which not only enforces realistic trajectory connectivity within road networks, but also enables knowledge transfer across tasks like trajectory recovery and travel time prediction. Besides, we design a confidence-based early-exiting mechanism to eliminate redundant denoising steps without sacrificing quality, enabling scalable applications in mobility analytics. Extensive experiments on three real-world urban trajectory datasets demonstrate the superior performance of AutoDiff in path prediction, trajectory recovery and time estimation tasks, outperforming task-specific baselines while maintaining computational efficiency.
- New
- Research Article
- 10.1145/3776570
- Jan 13, 2026
- ACM Transactions on Intelligent Systems and Technology
- Shucun Fu + 4 more
This is a corrigendum for the article “DESIGN: Online Device Selection and Edge Association for Federated Synergy Learning-enabled AIoT” published in ACM Trans. Intell. Syst. Technol. 15, 5, Article 104 (November 2024), 28 pages.
- New
- Research Article
- 10.1145/3786603
- Dec 29, 2025
- ACM Transactions on Intelligent Systems and Technology
- Jun-Wei Chiu + 1 more
Graph neural networks (GNNs) have recently achieved remarkable performance on the node classification task. While most typical GNN models presume that the graph data is clean, however, graphs could be polluted by various noises that hurt the prediction accuracy. Besides, while GNNs rely on sufficient labeled data to propagate the supervision signal, we should move to a more realistic setting – to learn with a limited amount of labeled data, i.e., label scarcity. In this work, we aim at building a holistically robust GNN against four different types of graph noise, including adversarial attacks, edge sparsity, noisy labels, and high heterophily, in the presence of label scarcity. We proposed a novel GNN framework, Holistically Robust Graph Neural Networks (HRGNN), to fulfill the goal. The main idea of HRGNN is to create synthetic nodes with labels and learn to properly connect them with existing nodes. With synthetic nodes, HRGNN can inject reliable information into existing nodes due to the message-passing mechanism of GNNs that helps purify the polluted representations of nodes and alleviate the negative effect caused by various noises. Furthermore, edge filtering in HRGNN helps remove the noisy edges to prevent the propagation of incorrect information, while pseudo labeling provides more label information to defend against label scarcity and label noise. Experiments conducted on eight graph datasets exhibit that HRGNN consistently outperforms the state-of-the-art GNN competing models on four types of noisy settings with label scarcity. To the best of our knowledge, HRGNN is the first GNN model that is holistically robust to various types of noise and label scarcity.