Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Salient Object Detection
  • Salient Object Detection
  • Human-object Interaction
  • Human-object Interaction
  • Salient Object
  • Salient Object

Articles published on Human-Object Interaction Detection

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
116 Search results
Sort by
Recency
  • Research Article
  • 10.1016/j.eswa.2025.130216
Egocentric human-object interaction detection: A new benchmark and method
  • Mar 1, 2026
  • Expert Systems with Applications
  • Kunyuan Deng + 2 more

Egocentric human-object interaction detection: A new benchmark and method

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.asoc.2026.114765
Soft-label guided multi-granularity prompts learning for human-object interaction detection
  • Feb 1, 2026
  • Applied Soft Computing
  • Xiaoqian Han + 4 more

Soft-label guided multi-granularity prompts learning for human-object interaction detection

  • Research Article
  • 10.1016/j.engappai.2025.113406
Appearance-semantic graphical model for human-object interaction detection
  • Feb 1, 2026
  • Engineering Applications of Artificial Intelligence
  • Qing Ye + 3 more

Appearance-semantic graphical model for human-object interaction detection

  • Research Article
  • 10.1109/access.2026.3678513
GeCHO: Generation of Contextualized Human-Object interactions
  • Jan 1, 2026
  • IEEE Access
  • Giovanni Minelli + 4 more

Creating realistic human-world interactions with diffusion models remains a key challenge, often requiring tedious trial-and-error processes and iterative manual refinements to achieve the desired result. Current approaches either fail to seamlessly integrate new content while maintaining global consistency of the scene, or require time-consuming editing and prompt engineering, making the process impractical for large-scale applications. To address this challenge, we propose an inpainting approach that specifically tackles the complexities of generating contextual human-object interactions, which we refer to as GeCHO. Our method improves local object fidelity and global scene consistency by leveraging cross-attention maps for automated, annotation-free object placement and using ControlNet to ensure precise spatial localization.We demonstrate the practical impact of our approach through two key applications: natural image inpainting, where we achieve contextual object placement with flexible spatial control, and human-object interaction detection, where we address the problem of long-tail distributions through synthetic data generation. Our results show that the proposed method enhances realism and adherence of generated images to text prompts, simplifies the generation of complex scenes without extensive input engineering, and improves performance in computer vision tasks limited by data scarcity. The source code implementation is available here: https://github.com/johnMinelli/gecho.

  • Research Article
  • 10.1109/tmm.2025.3632627
ASK-HOI: Affordance-Scene Knowledge Prompting for Human-Object Interaction Detection
  • Jan 1, 2026
  • IEEE Transactions on Multimedia
  • Dongpan Chen + 5 more

Human-object interaction (HOI) detection task aims to learn how humans interact with surrounding objects by inferring fine-grained triples of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\left\langle \rm {\emph {human, action, object}} \right\rangle$</tex-math></inline-formula>, which plays a vital role in computer vision tasks such as human-centered scene understanding and visual question answering. However, HOI detection suffers from class long-tailed distributions and zero-shot problems. Current methods typically identify HOI only from input images or label spaces in a data-driven manner, lacking sufficient knowledge prompts, and consequently limits their potential for real-world scenes. Hence, to fill this gap, this paper introduces affordance and scene knowledge as prompts on different granularities to the HOI detector to improve its recognition ability. Concretely, we first construct a large-scale affordance-scene knowledge graph, named ASKG, whose knowledge can be divided into two categories according to the fields of image information, i.e., the knowledge related to affordances of object instances and the knowledge associated with the scene. Subsequently, the knowledge of affordance and scene specific to the input image is extracted by an ASKG-based prior knowledge embedding module. Since this knowledge corresponds to the image at different granularities, we then propose an instance field adaptive fusion module and a scene field adaptive fusion module to enable visual features fully absorb the knowledge prompts. These two encoded features of different fields and knowledge embeddings are finally fed into a proposed HOI recognition module to predict more accurate HOI results. Extensive experiments on both HICO-DET and V-COCO benchmarks demonstrate that the proposed method leads to competitive results compared with the state-of-the-art methods.

  • Research Article
  • 10.1109/access.2026.3659132
Bridging Detection Architectures With Foundation Models: A Unified Framework for Human–Object Interaction Detection
  • Jan 1, 2026
  • IEEE Access
  • Junwen Chen + 1 more

Human–Object Interaction Detection (HOID) has benefited greatly from advances in modern detection architectures and vision-language foundation models. In this paper, we present two progressively improved HOID frameworks—SOV-STG-VLA and Hybrid-SOV—that jointly push the frontier of accurate and efficient interaction understanding. SOV-STG-VLA reformulates HOI prediction as a subject–object–verb (SOV) decoding problem and introduces a Split Target Guided (STG) denoising strategy that accelerates convergence while enhancing structural consistency. Furthermore, a Vision–Language Advisor (VLA) integrates priors from large multimodal models to enrich verb semantics and improve HOI classification. Building upon this, Hybrid-SOV aligns HOID with the latest object-detection paradigms by incorporating an efficient hybrid encoder and a query-selection mechanism that directly constructs HOI queries from visual features, eliminating predefined embeddings and enabling more interpretable decoding. When coupled with DINO-v3 foundation features, Hybrid-SOV achieves state-of-the-art accuracy with superior inference efficiency. Extensive experiments on HICO-DET and V-COCO demonstrate that our frameworks not only advance HOID performance but also establish an effective path toward bridging detection architectures with vision-language foundation models.

  • PDF Download Icon
  • Research Article
  • 10.1007/s44267-025-00102-0
Visual-guided human-object interaction detection
  • Dec 1, 2025
  • Visual Intelligence
  • Fang Nan + 4 more

Abstract The aim of human-object interaction (HOI) detection is to identify the triplets consisting of a human, a verb, and an object. Although existing methods leverage vision-language models (e.g., CLIP) to transfer textual information for unseen compositions, they often fail to capture the fine-grained visual cues that are essential for complex interactions, such as spatial configurations and object affordances. In this paper, we introduce visual guidance as an alternative approach to achieving the desired outcome. We define a new visual-guided HOI detection task for the first time, aiming at detecting unseen HOI categories using a small number of guidance examples. To support this new task, we have constructed a new benchmark dataset, which contains one base set and four novel sets, taking into account the peculiarities of HOI. Then, we propose a VG-HOI model with progressive guidance, query reconstruction, and a conditional uncoupling decoder to supplement common HOI knowledge and task-specific cues to improve the generalization capability of our model. Besides, we explore a new guidance sampling strategy — disentangled guidance — for real-world scenarios. Our in-depth analysis of the experimental results shows that the proposed model can improve the ability to generalize when detecting visual-guided HOI.

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.neucom.2025.130882
Exploring interaction concepts for human–object-interaction detection via global- and local-scale enhancing
  • Oct 1, 2025
  • Neurocomputing
  • Tianlun Luo + 6 more

Exploring interaction concepts for human–object-interaction detection via global- and local-scale enhancing

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.neucom.2025.130709
Simple yet effective: An explicit query-based relation learner for human–object-interaction detection
  • Oct 1, 2025
  • Neurocomputing
  • Tianlun Luo + 6 more

Simple yet effective: An explicit query-based relation learner for human–object-interaction detection

  • Research Article
  • Cite Count Icon 1
  • 10.1109/tcyb.2025.3587037
Interaction-Aware Transformer Network for Human-Object Interaction Detection.
  • Sep 1, 2025
  • IEEE transactions on cybernetics
  • Weibo Jiang + 5 more

human-object interaction (HOI) detection tackles the problem of joint localization and classification of HOIs. Recent HOI detection methods are mainly based on transformer networks, where the explicit priors at the object level (e.g., scene layout, object appearance, or category) are usually fed into the transformer to improve the object query ability. Though these methods have achieved remarkable results, they did not pay enough attention to the implicit action-level information, which is the fundamental element of HOI. In this work, we propose an interaction-aware transformer network (IATN) to obtain the interaction-aware query, by jointly utilizing implicit action-level priors and explicit object-level priors. Specifically, we design an action-aware module (AAM) to aggregate implicit action priors from the scene level and instance level, respectively. Then, we design an action-oriented graph (AOG), where human feature and object feature are graph nodes and action semantics represent graph edges, to aggregate priors jointly from action level and object level. Afterwards, the interaction-aware query is acquired and finally adopted to obtain the HOI predictions. Besides, we leverage knowledge distillation to enhance the action-level priors by transferring the final HOI predictions to the intermediate features. Extensive experiments on HICO-DET and V-COCO datasets verify the effectiveness of our proposed interaction-aware model.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.patcog.2025.112242
CORE-CLIP: Smart Collaborative Reasoning Driven by CLIP for Human-Object Interaction Detection
  • Aug 1, 2025
  • Pattern Recognition
  • Yuequan Yang + 5 more

CORE-CLIP: Smart Collaborative Reasoning Driven by CLIP for Human-Object Interaction Detection

  • Research Article
  • 10.1007/s10489-025-06730-9
Human-object interaction detection based on adaptive contrastive learning and class-specific feature enhancement
  • Jul 10, 2025
  • Applied Intelligence
  • Huanchun Peng + 7 more

Human-object interaction detection based on adaptive contrastive learning and class-specific feature enhancement

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.neunet.2025.107348
Towards zero-shot human-object interaction detection via vision-language integration.
  • Jul 1, 2025
  • Neural networks : the official journal of the International Neural Network Society
  • Weiying Xue + 5 more

Towards zero-shot human-object interaction detection via vision-language integration.

  • Research Article
  • 10.3390/info16060474
Enhancing Rare Class Performance in HOI Detection with Re-Splitting and a Fair Test Dataset
  • Jun 6, 2025
  • Information
  • Gyubin Park + 1 more

In Human–Object Interaction (HOI) detection, class imbalance severely limits the performance of a model on infrequent interaction categories. To overcome this problem, a Re-Splitting algorithm has been developed. This algorithm implements DreamSim-based clustering and performs k-means-based partitioning to restructure the train–test splits. By doing so, the approach balances the rarities and frequent classes of interaction equally, thereby increasing robustness. A Real-World test dataset has also been introduced. This dataset is comparable to a truly independent benchmark. It is designed to address class distribution bias, which is commonly present in traditional test sets. However, as shown in the Experiment and Evaluation subsection, a high level of performance can be achieved for the general case using different few-shot and rare-class training instances. Models trained solely on the re-split dataset show significant improvements in rare-class mAP, particularly for one-stage models. Evaluation on the test dataset from the real world further emphasizes previously overlooked model performance and supports fair structuring of dataset. The methods are validated with extensive experiments using five one-stage and two two-stage models. Our analysis shows that reshaping dataset distributions increases rare-class detection by as much as 8.0 mAP. This study paves the way for balanced training and evaluation leading to the formulation of a general framework for scalable, fair, and generalizable HOI detection.

  • Research Article
  • Cite Count Icon 1
  • 10.1007/s11227-025-07308-5
Comprehensive context learning for two-stage human-object interaction detection
  • May 3, 2025
  • The Journal of Supercomputing
  • Limin Xia + 1 more

Comprehensive context learning for two-stage human-object interaction detection

  • Research Article
  • 10.1007/s11263-025-02445-z
Interaction Confidence Attention for Human–Object Interaction Detection
  • Apr 28, 2025
  • International Journal of Computer Vision
  • Hong-Bo Zhang + 5 more

Interaction Confidence Attention for Human–Object Interaction Detection

  • Research Article
  • Cite Count Icon 1
  • 10.1609/aaai.v39i9.32972
HOIMamba: Efficient Mamba-based Disentangled Progressive Learning for HOI Detection
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Yongchao Xu + 4 more

Human-object interaction (HOI) detection aims to detect the spatial positions of human-object pairs and recognize their interactions. Existing single-branch, two-branch, and three-branch methods are challenging to make an appropriate trade-off on efficiency, multi-task decoupling, and collaborative learning, while they fail to identify rare and complex interaction categories effectively as well. In this work, we propose a novel Efficient Mamba-based Disentangled Progressive Learning (HOIMamba) for HOI Detection to absorb the advantages of the existing three approaches and adaptively aggregate multi-level interaction semantics guided by cross-task bidirectional information contexts. Specifically, HOIMamba builds an efficient and effective decoder through cascaded Low-Rank Adaptations (LoRAs), with high efficiency, thorough decoupling of tasks, and good multi-task collaborative learning. Furthermore, to alleviate the recognition problem of interactions in difficult HOI samples, a novel Mamba-based comprehensive progressive learning strategy with Cross-enhance Mamba (CEM) blocks and Detection Context Propagation (DCP) blocks is designed to gradually excavate interaction-related discriminative cues from four levels. CEM blocks automatically aggregate context to generate diverse task-shared semantics and simultaneously realize the cross-task interaction between human and object branches, guiding the interaction branch to extract more expressive HOI representation. DCP blocks further transfer the comprehensive interaction context to human and object branches to achieve rich and effective information exchange, facilitating the model to discover more HOI instances. Extensive experimental results on two standard benchmarks demonstrate the effectiveness of our HOIMamba.

  • Research Article
  • Cite Count Icon 3
  • 10.1609/aaai.v39i4.32411
ContextHOI: Spatial Context Learning for Human-Object Interaction Detection
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Mingda Jia + 3 more

Spatial contexts, such as the backgrounds and surroundings, are considered critical in Human-Object Interaction (HOI) recognition, especially when the instance-centric foreground is blurred or occluded. Recent advancements in HOI detectors are usually built upon detection transformer pipelines. While such an object-detection-oriented paradigm shows promise in localizing objects, its exploration of spatial context is often insufficient for accurately recognizing human actions. To enhance the capabilities of object detectors for HOI detection, we present a dual-branch framework named ContextHOI, which efficiently captures both object detection features and spatial contexts. In the context branch, we train the model to extract informative spatial context without requiring additional hand-craft background labels. Furthermore, we introduce context-aware spatial and semantic supervision to the context branch to filter out irrelevant noise and capture informative contexts. ContextHOI achieves state-of-the-art performance on the HICO-DET and v-coco benchmarks. For further validation, we construct a novel benchmark, HICO-ambiguous, which is a subset of HICO-DET that contains images with occluded or impaired instance cues. Extensive experiments across all benchmarks, complemented by visualizations, underscore the enhancements provided by ContextHOI, especially in recognizing interactions involving occluded or blurred instances.

  • Research Article
  • 10.52783/jisem.v10i30s.4891
ReCap Pro: Caption Correction using Meta Learning
  • Mar 31, 2025
  • Journal of Information Systems Engineering and Management
  • Sakshi Birthi

This article presents ReCap Pro, a framework that corrects auto-generated captions by dealing with the possible errors in nouns and verbs in the caption. While caption correction has been attempted earlier, it is observed that it has never been tried as a meta-learning-based approach. The work described in this article offers few-shot learning enabling faster learning with fewer samples of images, solving one of the critical limitations of the traditional data-intensive caption generation models. An object detection model trained using Reptile Meta-Learning is employed to detect the correct nouns and a human object interaction (HOI) detection model trained using Prototypical Networks is used to detect the verbs in the image. The proposed method addresses a long-standing limitation of existing caption generation models that rely on large amounts of training data and can be used as an extra layer of performance enhancer with existing caption generators. The suggested technique can be applied as an additional performance enhancer layer over current caption generators to overcome a long-standing shortcoming of those models

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 10
  • 10.1007/s44267-025-00074-1
Enhancing human-centered dynamic scene understanding via multiple LLMs collaborated reasoning
  • Mar 17, 2025
  • Visual Intelligence
  • Hang Zhang + 3 more

Human-centered dynamic scene understanding plays a pivotal role in enhancing the capability of robotic and autonomous systems, where video-based human-object interaction (V-HOI) detection is a crucial task in semantic scene understanding, which aims to comprehensively understand HOI relationships within a video to benefit the behavioral decisions of mobile robots and autonomous driving systems. Although previous V-HOI detection models have made significant advances in accurate detection on specific datasets, they still lack the general reasoning ability of humans to effectively induce HOI relationships. In this study, we propose V-HOI multi-LLMs collaborated reasoning (V-HOI MLCR), a novel framework consisting of a series of plug-and-play modules that could facilitate the performance of current V-HOI detection models by leveraging the strong reasoning ability of different off-the-shelf pre-trained large language models (LLMs). We design a two-stage collaboration system of different LLMs for the V-HOI task. Specifically, in the first stage, we design a cross-agents reasoning scheme to leverage the LLM to perform reasoning from different aspects. In the second stage, we perform multi-LLMs debate to get the final reasoning answer based on the different knowledge in different LLMs. Additionally, we develop an auxiliary training strategy using CLIP, a large vision-language model to enhance the base V-HOI models’ discriminative ability to better cooperate with LLMs. We validate the superiority of our design by demonstrating its effectiveness in improving the predictive accuracy of the base V-HOI model through reasoning from multiple perspectives.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers