Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Class Of Objects
  • Class Of Objects
  • Object Instances
  • Object Instances
  • Target Object
  • Target Object
  • Unknown Objects
  • Unknown Objects
  • Occluded Objects
  • Occluded Objects
  • Object Categories
  • Object Categories

Articles published on Unseen Objects

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
436 Search results
Sort by
Recency
  • New
  • Research Article
  • Cite Count Icon 1
  • 10.1145/3787522
AerOSeg++: Scale-Aware and Texture-Guided Open-Vocabulary Segmentation with SAM Features for Remote Sensing Images
  • Apr 20, 2026
  • ACM Transactions on Multimedia Computing, Communications, and Applications
  • Saikat Dutta + 3 more

Remote sensing image segmentation poses significant challenges in generalizing to unseen categories during the evaluation phase. Existing open-vocabulary segmentation methods, primarily designed for natural images, struggle to cope with the spatial complexity, scale variation, and high-resolution characteristics of remote sensing imagery. Specifically, scale variations during inference can degrade performance, as the model tends to overfit to fixed-scale patterns encountered during training. This also affects the model’s ability to recognize unseen or novel class objects appearing in varying sizes or resolutions during testing. These limitations increase the need for developing open-vocabulary segmentation methods addressing the challenges of geospatial images. In this work, we introduce AerOSeg ++, an open-vocabulary segmentation method in remote sensing, focusing on scale-invariant feature learning. We first compute robust image-text correlation features using rotated input images and domain-specific prompts. These are refined via spatial and class refinement blocks, guided by SAM features to enhance spatial consistency. To upscale the refined correlation features, we propose a multi-scale decoder framework that fuses fine-grained texture features with SAM-derived features. By leveraging texture information across multiple receptive fields, AerOSeg++ effectively captures scale-consistent patterns, facilitating accurate segmentation of objects across varying spatial resolutions. Additionally, our training pipeline incorporates ScaleDrop, a computationally efficient parameter-free feature rescaling module ensuring scale-invariant feature representation learning. Our proposed model has shown significant performance gains compared to the state-of-the-art open-vocabulary methods when evaluated on three benchmark datasets for remote sensing—iSAID, DLRSD, and OpenEarthMap. These results highlight the effectiveness of our scale-invariant design and texture-guided multi-scale feature upsampling in handling the challenges of open-vocabulary segmentation in remote sensing imagery.

  • Research Article
  • 10.1109/tnnls.2026.3678945
Causal Counterfactual Inference Network for Video Object State Changes in Open-World Scenarios.
  • Apr 14, 2026
  • IEEE transactions on neural networks and learning systems
  • Zhichao Wang + 3 more

Object state changes (OSCs) play a critical role in video understanding, as they focus on localizing the stages of state transitions within temporal sequences. However, existing methods face two key challenges in open-world scenarios. First, there is a significant background-causal scene imbalance due to dataset bias. This leads to reliance on irrelevant features and degrades prediction capability. Second, existing methods have poor generalization performance on unseen objects. They typically focus on a single state change of a specific object, which limits them to understand the state change of an unseen object in a generalized way as humans do. To address these challenges, we first introduce a structural causal model (SCM) to formally structure the OSC task, which explicitly defines the confounding effect of dataset bias and the lack of generalization. Guided by this SCM, we propose CCI-Net, a causal counterfactual inference-based video OSC neural network. CCI-Net employs a causal inference network for backdoor adjustment to effectively eliminate confounders. In addition, it integrates counterfactual inference to enhance understanding in open-world scenarios. Specifically, CCI-Net comprises two key components: the backdoor scene classifier (BSC) and the counterfactual module (CM). The BSC controls potential confounders and mitigates spurious correlations. The CM enhances generalization to unseen objects and their state changes by constructing counterfactual scenes during training. Furthermore, we design two loss functions for causal and counterfactual scenes to optimize the learning process. Experimental results on three benchmark datasets demonstrate that, compared with existing methods, CCI-Net significantly improves both precision and generalization in open-world scenarios.

  • Research Article
  • 10.1016/j.robot.2026.105334
Enhancing sampling-based planning with a library of paths
  • Apr 1, 2026
  • Robotics and Autonomous Systems
  • Michal Minařík + 2 more

Path planning for 3D solid objects is a challenging problem, requiring a search in a six-dimensional configuration space, which is, nevertheless, essential in many robotic applications such as bin-picking and assembly. The commonly used sampling-based planners, such as Rapidly-exploring Random Trees, struggle with narrow passages where the sampling probability is low, increasing the time needed to find a solution. In scenarios like robotic bin-picking, various objects must be transported through the same environment. However, traditional planners start from scratch each time, losing valuable information gained during the planning process. We address this by using a library of past solutions, allowing the reuse of previous experiences even when planning for a new, previously unseen object. Paths for a set of objects are stored, and when planning for a new object, we find the most similar one in the library and use its paths as approximate solutions, adjusting for possible mutual transformations. The configuration space is then sampled along the approximate paths. Our method is tested in various narrow passage scenarios and compared with state-of-the-art methods from the OMPL library. Results show significant speed improvements (up to 85 % decrease in the required time) of our method, often finding a solution in cases where the other planners fail. Our implementation of the proposed method is released as an open-source package. • Reuses the previous planning experience even with new, unseen objects. • Up to 85% reduction in planning time in narrow-passage scenarios. • Released as an open-source package with prepared examples.

  • Research Article
  • 10.3390/s26061755
HSG-ON: Hierarchical Scene Graph-Based Object Navigation.
  • Mar 10, 2026
  • Sensors (Basel, Switzerland)
  • Seokjoon Kwon + 2 more

For a robot to operate effectively in human-centric environments, finding objects based on natural language is essential. Zero-shot object goal navigation is a significant challenge where robots must find unseen objects in new environments without prior knowledge. Existing methods often struggle with strategic exploration, leading to inefficient searches. In this study, we propose a hierarchical scene graph-based navigation system to address this challenge. Our core innovations are twofold: dynamically constructing a three-layer "room-workspace-object" hierarchical scene graph without manually pre-tuned parameters, and introducing a novel workspace-based searching strategy. By evaluating semantic relevance at the workspace level rather than the object level, the robot infers probable containers for a target, enabling focused, human-like exploration. Simulation results demonstrate that our system significantly outperforms existing state-of-the-art methods. Quantitatively, our approach improves the Success Rate (SR) by 26.8% (SR 0.4859) under distance-constrained settings and by 20.2% (SR 0.7360) under unconstrained settings, compared to the best baselines. These results validate that our framework offers a robust solution for zero-shot object goal navigation.

  • Research Article
  • 10.3390/electronics15051111
ProM-Pose: Language-Guided Zero-Shot 9-DoF Object Pose Estimation from RGB-D with Generative 3D Priors
  • Mar 7, 2026
  • Electronics
  • Yuchen Li + 3 more

Object pose estimation is fundamental for robotic manipulation, autonomous driving, and augmented reality, yet recovering the full 9-DoF state (rotation, translation, and anisotropic 3D scale) from RGB-D observations remains challenging for previously unseen objects. Existing methods either rely on instance-specific CAD models, predefined category boundaries, or suffer from scale ambiguity under sparse observations. We propose ProM-Pose, a unified cross-modal temporal perception framework for zero-shot 9-DoF object pose estimation. By integrating language-conditioned generative 3D shape priors as canonical geometric references, an asymmetric cross-modal attention mechanism for spatially aware fusion, and a decoupled pose decoding strategy with temporal refinement, ProM-Pose constructs metrically consistent and semantically grounded representations without relying on category-specific pose priors or instance-level CAD supervision. Extensive experiments on CAMERA25 and REAL275 benchmarks demonstrate that ProM-Pose achieves competitive or superior performance compared to category-level methods, with mAP of 75.0% at 5°,2cm and 90.5% at 10°,5cm on CAMERA25, and 42.2% at 5°,2cm and 76.0% at 10°,5cm on REAL275 under zero-shot cross-domain evaluation. Qualitative results on real-world logistics scenarios further validate temporal stability and robustness under occlusion and lighting variations. ProM-Pose effectively bridges semantic grounding and metric geometric reasoning within a unified formulation, enabling stable and scale-aware 9-DoF pose estimation for previously unseen objects under open-vocabulary conditions.

  • Research Article
  • 10.1186/s13677-026-00872-y
LMM-guided knowledge distillation for power operation object detection in cloud-edge environment
  • Mar 6, 2026
  • Journal of Cloud Computing
  • Bingyang Li + 4 more

Power-grid field operations demand real-time visual monitoring to verify personal protective equipment and tool usage under large depth-of-field. Conventional real-time detectors are efficient but closed-vocabulary; they struggle with rare or unseen objects. Large multimodal models (LMM) offer open-vocabulary understanding guided by prompts, yet are too heavy for edge deployment. To address these challenges, We propose an LMM-guided distillation framework that transfers prompt-grounded semantics from a large teacher to a lightweight YOLO-style student. The teacher, queried with expanded prompt set, produces pseudo labels and region–text embeddings. The student is trained with a standard detection objective and three semantic transfers. Firstly, feature distillation aligns student features to teacher region embeddings via a linear projector; Secondly, prompt-aware logit distillation matches student logits to the teacher’s temperature-smoothed prompt distribution; and thirdly, vision–language contrastive alignment ties projected student regions to the correct prompt embedding. Experiments on two benchmark dataset indicate consistent gains on both common and rare categories while retaining real-time throughput on edge hardware, demonstrating a practical cloud-to-edge pipeline for safety monitoring.

  • Research Article
  • 10.26599/tst.2026.90100019
O 2 Exp: Online Object Exploration in Underwater Environment
  • Feb 1, 2026
  • Tsinghua Science and Technology
  • Xingyu Chen + 4 more

The underwater environment contains a wealth of biological and mineral resources, making the deployment of autonomous underwater vehicles (AUVs) essential for exploration and development. Despite years of research in data-driven machine vision techniques, the offline collection of underwater data remains quite difficult compared to terres-trial samples. This paper focuses on online object exploration in underwater environments without manual intervention, including sub-tasks of close- and open-set detection, fine-grained novel-class subdivision, and few-shot incremental learning. To address this challenge, we start with a few-shot detector for detecting known classes and propose an open-set detector for exploring novel categories. The open-set detector can model unseen objects with fused semantics-localization cues and discrepancy-enhanced representation. Furthermore, we design detector-driven clustering to subdi-vide novel objects into an arbitrary number of novel classes as pseudo-labels. Finally, incremental learning is performed to model novel-category representation while maintaining base-class knowledge, where gradient rescaling and knowl-edge distillation strategies are designed to avoid catastrophic forgetting. Overall, our proposed framework, called O<sup>2</sup>Exp, can autonomously explore objects in unstructured underwater environments. Extensive experiments with public datasets and real-world tests verify the accuracy, robustness, and practicality of the proposed O<sup>2</sup>Exp framework.

  • Research Article
  • 10.64898/2025.12.31.697247
SpatialDINO: A Self-Supervised 3D Vision Transformer that enables Segmentation and Tracking in Crowded Cellular Environments
  • Jan 25, 2026
  • bioRxiv
  • Alex Lavaee + 5 more

Quantitative, time–resolved 3D fluorescence microscopy can reveal complex cellular dynamics in living cells and tissues. Broader use remains limited by the difficulty of identifying, segmenting, and tracking objects of different size and shape in crowded intracellular environments in low–contrast, anisotropic, monochromatic image volumes. Objects overlap, deform, appear and disappear, and span wide ranges of size and intensity. Classical segmentation pipelines typically require high signal–to–noise data and rely on intensity heuristics with hand-tuned postprocessing that generalize poorly. Supervised deep learning methods require extensive voxel–level annotations that are costly, inconsistent across phenotypes, and rapidly become obsolete as imaging conditions change. We introduce SpatialDINO, a fully automated self–supervised method that trains a native 3D vision transformer, based on a modified version of DINOv2. SpatialDINO yields robust semantic feature maps from single channels of multi–channel microscopy that, irrespective of object shape, support object detection and segmentation directly from naïve 3D images across z–spacings and numbers of planes and different imaging modalities, without retraining or voxel annotations. We trained SpatialDINO on a small set of confocal volumes acquired by live–cell fluorescent 3D lattice light–sheet microscopy, spanning targets of different size and shape located in crowded cellular environments, from diffraction–limited clathrin coated pits and clathrin coated vesicles to bigger structures including endosomes and lysosomes, and endosomes and lysosomes pharmacologically enlarged to highlight endosomal membrane profiles. Post–processing of the features generated by SpatialDINO allows detection and unique object identification of these objects in naïve 3D images. It also enables detection of significantly different previously unseen object classes, such as cellular plasma membranes and nuclei and even tumors in MRI scans. Finally, we illustrate its value by tracking endosomes in 3D time series, combining SpatialDINO–derived feature similarity with spatial proximity to improve association through occlusion, abrupt appearance changes, and dense packing — all conditions that have been challenging for existing methods. SpatialDINO therefore lowers a major barrier to quantitative analysis of heterogeneous, monochromatic objects in crowded 3D cellular environments.

  • Research Article
  • 10.1038/s41598-026-36445-x
Few-shot cross-episode adaptive memory for metal surface defect semantic segmentation.
  • Jan 18, 2026
  • Scientific reports
  • Jiyan Zhang + 5 more

Few-shot semantic segmentation has gained significant attention in metal surface defect detection due to its ability to segment unseen object classes with only a few annotated defect samples. Previous methods constrained to single-episode training suffer from limited adaptability in semantic description of defect regions and coarse segmentation granularity. In this paper, we propose an episode-adaptive memory network (EAMNet) that specifically addresses subtle variances between episodes during training. The episode adaptive memory unit (EAMU) leverages an adaptive factor to model semantic dependencies across different episodes. The context adaptation module (CAM) aggregates hierarchical features of support-query pairs for fine-grained segmentation. The proposed global response mask average pooling (GRMAP) introduces a global response normalization to obtain fine-grained cues directly from the support prototype. We also introduce an attention distillation (AD), which leverages fine-grained semantic attention correspondence to process defect region cues and stabilize the cross-episode adaptation in EAMU. Extensive experiments demonstrate that our approach establishes new state-of-the-art performance on both Surface Defect-[Formula: see text] and FSSD-12 datasets.

  • Research Article
  • 10.1177/17298806261430024
Understanding physical properties of unseen deformable objects by leveraging large-language models and robot actions
  • Jan 1, 2026
  • International Journal of Advanced Robotic Systems
  • Changmin Park + 4 more

In this article, we consider the problem of understanding the physical properties of unseen objects through interactions between the objects and a robot. Handling unseen objects with special properties such as deformability is challenging for traditional task and motion planning approaches as they are often with the closed-world assumption. Recent results in large-language model (LLM)-based task planning have shown the ability to reason about unseen objects. However, most studies assume rigid objects, overlooking their physical properties. We propose an LLM-based method for probing the physical properties of unseen deformable objects for the purpose of task planning. For a given set of object properties (e.g. foldability, bendability), our method uses robot actions to determine the properties by interacting with the objects. Based on the properties examined by the LLM and robot actions, the LLM generates a task plan for a specific domain such as object packing. In the experiment, we show that the proposed method can identify properties of deformable objects, which are further used for a bin-packing task where the properties take crucial roles to succeed.

  • Research Article
  • 10.1109/lra.2026.3673994
3D Cal: An Open-Source Software Library for Depth Reconstruction on Vision-Based Tactile Sensors
  • Jan 1, 2026
  • IEEE Robotics and Automation Letters
  • Rohan Kota + 3 more

Tactile sensing plays a key role in enabling dexterous and reliable robotic manipulation, but realizing this capability requires substantial calibration to convert raw sensor readings into physically meaningful quantities. Despite its near-universal necessity, the calibration process remains ad hoc and labor-intensive. Here, we introduce 3D Cal, an open-source library that transforms a low-cost 3D printer into an automated probing device capable of generating large volumes of labeled training data for calibrating vision-based tactile sensors. 3D Cal also provides an end-to-end, user-friendly pipeline for training custom convolutional networks to produce high-quality depth reconstructions. Using 3D Cal, we systematically explore the relationship between training data volume and spatial reconstruction performance on two commercially available sensors, DIGIT and GelSight Mini, and derive practical, empirically-grounded guidelines for calibrating these sensors. Finally, we demonstrate depth reconstruction performance on the DIGIT and GelSight Mini comparable to state-of-the-art methods, achieving average reconstruction errors of 156 μm and 205 μm on unseen objects, respectively. By automating tactile sensor calibration, 3D Cal can accelerate tactile sensing research, simplify sensor deployment, and facilitate the integration of tactile sensing in robotic platforms.

  • Research Article
  • 10.1109/tmm.2026.3673587
Deep Learning-Driven Segmentation of Unseen Objects in Indoor Robotic Environments
  • Jan 1, 2026
  • IEEE Transactions on Multimedia
  • Ying Zhang + 6 more

For robots operating in unstructured indoor environments, the ability to accurately perceive and interact with invisible objects is crucial because it is infeasible and impractical to assume that every object in the environment is modeled. Despite advances in this direction, recognizing unseen objects remains a challenging perceptual task. The success of deep learning (DL) in many fields has promoted the development of DL-based methods to address this challenge. This paper proposes a comprehensive review of DL-driven Unseen Object Instance Segmentation (UOIS). To the best of our knowledge, this paper is the first to present the state-of-the-art solutions for UOIS, since existing reviews focused on either known object segmentation, on learning-based network models, or on some part of these, and little effort can be found on DL-based UOIS. In addition to conducting a comprehensive evaluation of existing literature on UOIS, we also classify DL-based UOIS methods according to their technical characteristics. After that, popular synthetic and benchmark datasets for UOIS are outlined, and a performance comparison of DL-based UOIS on the datasets is given. Finally, current challenges are discussed, and potential research opportunities are suggested.

  • Research Article
  • 10.1109/lra.2026.3674002
DiffusionHandover: Reliable Human-to-Robot Handover Generation with Anthropomorphic Hand
  • Jan 1, 2026
  • IEEE Robotics and Automation Letters
  • Yifan Yang + 7 more

Human-to-robot handover is a fundamental capability in human-robot interaction, critical for effective collaboration in service and assistive domains. Despite recent progress, ensuring both reliability and safety-particularly collision-free interaction with the human hand-remains a major challenge, especially when using anthropomorphic robotic hands. In this work, we propose DiffusionHandover, a novel framework built on a Decomposed Vector-Quantized VAE (DVQ-VAE) latent diffusion model, further enhanced with reinforcement learning from human feedback (RLHF) to improve grasp reliability and alignment with human preferences. We validate our approach extensively in both simulation and the real world using a Schunk SVH anthropomorphic hand. Our method achieves an average success rate above 80% with diverse grasp configurations on unseen objects. In addition, we conduct ablation studies to assess individual submodules, as well as comparative evaluations against state-of-the-art baselines.

  • Research Article
  • 10.1109/tmm.2026.3668516
Det-Agent: Open-Vocabulary Object Localization and Detection with Reinforcement Learning Agent
  • Jan 1, 2026
  • IEEE Transactions on Multimedia
  • Ruisong Zhang + 3 more

Object detection, which aims to locate and recognize objects in images, is evolving toward reduced reliance on manual annotations and enhanced adaptability to open-world scenarios. This shift has led to open-vocabulary object detection (OVD), which enables zero-shot detection of objects from novel categories beyond the base categories. In this work, we identify three key challenges in detecting unseen class instances: 1) locating the instances of new classes; 2) distinguishing new class instances from the background; 3) recognizing new class instances. We propose a detection framework that leverages vision-language pre-trained (VLPT) models, such as CLIP, as the backbone to jointly address these three challenges. Specifically, we treat localization as a box-deformation decision process, where the agent interacts with the image to learn a universal deformation strategy, enhancing generalization for unseen class objects. We further reformulate the foreground-background classification as an objectness ranking task to improve objectness evaluation, utilizing a specially designed AP loss. Additionally, a feature magnitude minimization constraint is introduced for the adapter during fine-tuning, boosting recognition performance for both base and novel classes. Experiments on COCO and LVIS datasets demonstrate that our method outperforms previous approaches in open-vocabulary object detection.

  • Research Article
  • 10.1109/lra.2026.3678126
Generalizable Hierarchical Skill Learning via Object-Centric Representation
  • Jan 1, 2026
  • IEEE Robotics and Automation Letters
  • Haibo Zhao + 11 more

We present Generalizable Hierarchical Skill Learning (GSL), a novel framework for hierarchical policy learning that significantly improves policy generalization and sample efficiency in robot manipulation. One core idea of GSL is to use object-centric skills as an interface that bridges the high-level vision-language model and the low-level visual-motor policy. Specifically, GSL decomposes demonstrations into transferable and object-canonicalized skill primitives using foundation models, ensuring efficient low-level skill learning in the object frame. At test time, the skill-object pairs predicted by the high-level agent are fed to the low-level module, where the inferred canonical actions are mapped back to the world frame for execution. This structured yet flexible design leads to substantial improvements in sample efficiency and generalization of our method across unseen spatial arrangements, object appearances, and task compositions. In simulation, GSL trained with only 3 demonstrations per task outperforms baselines trained with 30 times more data by 15.5% on unseen tasks. In real-world experiments, GSL also surpasses the baseline trained with 10 times more data.

  • Research Article
  • 10.1109/tpami.2026.3651728
Unleashing the Power of Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation.
  • Jan 1, 2026
  • IEEE transactions on pattern analysis and machine intelligence
  • Duo Peng + 6 more

Category-Agnostic Pose Estimation (CAPE) aims to detect keypoints of unseen object categories in a few-shot setting, where the scarcity of labeled data poses significant challenges to generalization. In this work, we propose Prompt Pose Matching (PPM), a novel framework that unleashes the power of off-the-shelf text-to-image diffusion models for CAPE. PPM learns pseudo prompts from few-shot examples via the text-to-image diffusion model. These learned pseudo prompts capture semantic information of keypoints, which can then be used to locate the same type of keypoints from images. To provide prompts with representative initialization, we introduce a category-agnostic pre-training strategy to capture the foreground prior shared across categories and keypoints. To support the reliable prompt pre-training, we propose a Foreground-Aware Region Aggregation (FARA) module to provide robust and consistent supervision signal. Based on the foreground prior, a Foreground-Guided Attention Refinement (FGAR) module is further proposed to reinforce cross-attention responses for accurate keypoint localization. For efficiency, a Prompt Ensemble Inference (PEI) scheme enables joint keypoint prediction. Unlike previous methods that highly rely on base-category annotated data, our PPM framework can operate in a base-category-free setting while retaining strong performance. Code will be available at: https://github.com/DuoPeng-CVer/Prompt-Pose-Matching.

  • Research Article
  • Cite Count Icon 1
  • 10.1109/tro.2026.3651674
RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction With Spatio-Temporal Aggregation
  • Jan 1, 2026
  • IEEE Transactions on Robotics
  • Naman Patel + 2 more

Mapping and understanding complex 3D environments is fundamental to how autonomous systems perceive and interact with the physical world, requiring both precise geometric reconstruction and rich semantic comprehension. While existing 3D semantic mapping systems excel at reconstructing and identifying predefined object instances, they lack the flexibility to efficiently build semantic maps with open-vocabulary during online operation. Although recent vision-language models have enabled open-vocabulary object recognition in 2D images, they haven't yet bridged the gap to 3D spatial understanding. The critical challenge lies in developing a training-free unified system that can simultaneously construct accurate 3D maps while maintaining semantic consistency and supporting natural language interactions in real time. In this paper, we develop a zero-shot framework that seamlessly integrates GPU-accelerated geometric reconstruction with open-vocabulary vision-language models through online instance-level semantic embedding fusion, guided by hierarchical object association with spatial indexing. Our training-free system achieves superior performance through incremental processing and unified geometric-semantic updates, while robustly handling 2D segmentation inconsistencies. The proposed general-purpose 3D scene understanding framework can be used for various tasks including zero-shot 3D instance retrieval, segmentation, and object detection to reason about previously unseen objects and interpret natural language queries.

  • Research Article
  • 10.56276/tasdiq.v7i1.10
نیر مسعود کا جہانِ تکنیک
  • Dec 17, 2025
  • TAṢDĪQ
  • Hafiz Muhammad Awais

Nayyar Masood is an expert and avid storyteller. He belongs to the tribe who have created a style that cannot fail to captivate the reader by combining the ancient and modern styles of storytelling. There is a special diversity in his style and techniques. Along with his excellent similes and detailing, he also uses different techniques in those details where he surprises the reader. Where he has narrated the story with the narrative technique, he has also created a rarity in it by giving a new twist to the story with flashback foreshadowing and dialogue techniques. He does not limit himself to just one technique in the story but uses several techniques together. Sometimes he works the magic of circular and montage techniques with flashback and sometimes he embellishes different themes with scenography, surrealism and magical realism. Different moral values, changing civilization, memories of the past and the mystery of life have been described in a very different but familiar way. The magical atmosphere in the story affects the reader a lot and this element speaks loudly in the case of Nayyar Masood. This magic is transformed into mystery through the use of hallucinations, which binds the reader from all sides as if a fence of an unseen object has been built around him.

  • Research Article
  • 10.3390/automation6040084
Reliable Detection of Unsafe Scenarios in Industrial Lines Using Deep Contrastive Learning with Bayesian Modeling
  • Dec 2, 2025
  • Automation
  • Jesús Fernández-Iglesias + 2 more

Current functional safety mechanisms mainly control the access points and perimeters of manufacturing cells without guaranteeing the integrity of their internal components or the absence of unauthorized humans or objects. In this work, we present a novel deep learning (DL)-based safety system that enhances the safety circuit designed according to functional safety principles, detecting, with great reliability, the presence of persons within the cell and, with high precision, anomalous elements of any kind. Our approach follows a two-stage DL methodology that combines contrastive learning with Bayesian clustering. First, a supervised contrastive scheme learns the characteristics of safe scenarios and distinguishes them from unsafe ones caused by workers remaining inside the cell. Next, a Bayesian mixture models the latent space of safe scenarios, quantifying deviations and enabling the detection of previously unseen anomalous objects without any specific fine-tuning. To further improve robustness, we introduce an ensemble-based hybrid latent-space methodology that maximizes performance regardless of the underlying encoders’ characteristics. The experiments are conducted on a real dataset captured in a belt-picking cell in production. The proposed system achieves 100% accuracy in distinguishing safe scenarios from those with the presence of workers, even in partially occluded cases, and an average area-under-the-curve of 0.9984 across seven types of anomalous objects commonly found in manufacturing environments. Finally, for interpretability analysis, we design a patch-based feature-ablation framework that demonstrates the model’s reliability under uncertainty and the absence of learning biases. The proposed technique enables the deployment of an innovative high-performance safety system that, to our knowledge, does not exist in the industry.

  • Research Article
  • 10.1109/tpami.2025.3600413
Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, With Single Reference.
  • Dec 1, 2025
  • IEEE transactions on pattern analysis and machine intelligence
  • Yuan Gao + 4 more

Humans can easily deduce the relative pose of a previously unseen object, without labeling or training, given only a single query-reference image pair. This is arguably achieved by incorporating i) 3D/2.5D shape perception from a single image, ii) render-and-compare simulation, and iii) rich semantic cue awareness to furnish (coarse) reference-query correspondence. Motivated by this, we propose a novel 3D generalizable relative pose estimation method by elaborating 3D/2.5D shape perception with a 2.5D shape from an RGB-D reference, fulfilling the render-and-compare paradigm with an off-the-shelf differentiable renderer, and leveraging the semantic cues from a pretrained model like DINOv2. Specifically, our differentiable renderer takes the 2.5D rotatable mesh textured by the RGB and the semantic maps (obtained by DINOv2 from the RGB input), then renders new RGB and semantic maps (with back-surface culling) under a novel rotated view. The refinement loss comes from comparing the rendered RGB and semantic maps with the query ones, back-propagating the gradients through the differentiable renderer to refine the 3D relative pose. As a result, our method can be readily applied to unseen objects, given only a single RGB-D reference, without labeling or training. Extensive experiments on LineMOD, LM-O, and YCB-V show that our training-free method significantly outperforms the state-of-the-art supervised methods, especially under the rigorous Acc@5/10/15$^\circ$∘ metrics and the challenging cross-dataset settings.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers