Target Object Research Articles

Deep learning technology can automatically learn features from large amounts of data, with powerful feature extraction and pattern recognition capabilities, thereby improving the accuracy and efficiency of object detection. [The objective of this study]: In order to improve the accuracy and speed of mask wearing deep learning detection models in the post pandemic era, the [Problem this study aimed to resolve] was based on the fact that no research work has been reported on standardized detection models for mask wearing with detecting nose targets specially. [The topic and method of this study]: A mask wearing normalization detection model (towards the wearing style exposing the nose to outside, which is the most obvious characteristic of non-normalized style) based on improved YOLOv5s (You Only Look Once v5s is an object detection network model) was proposed. [The improved method of the proposed model]: The improvement design work of the detection model mainly includes (1) the BottleneckCSP (abbreviation of Bottleneck Cross Stage Partial) module was improved to a BottleneckCSP-MASK (abbreviation of Bottleneck Cross Stage Partial-MASK) module, which was utilized to replace the BottleneckCSP module in the backbone architecture of the original YOLOv5s model, which reduced the weight parameters' number of the YOLOv5s model while ensuring the feature extraction effect of the bonding fusion module. (2) An SE module was inserted into the proposed improved model, and the bonding fusion layer in the original YOLOv5s model was improved for better extraction of the features of mask and nose targets. [Results and validation]: The experimental results indicated that, towards different people and complex backgrounds, the proposed mask wearing normalization detection model can effectively detect whether people are wearing masks and whether they are wearing masks in a normalized manner. The overall detection accuracy was 99.3% and the average detection speed was 0.014 s/pic. Contrasted with original YOLOv5s, v5m, and v5l models, the detection results for two types of target objects on the test set indicated that the mAP of the improved model increased by 0.5%, 0.49%, and 0.52%, respectively, and the size of the proposed model compressed by 10% compared to original v5s model. The designed model can achieve precise identification for mask wearing behaviors of people, including not wearing a mask, normalized wearing, and wearing a mask non-normalized.

Read full abstract

Object Goal Navigation(ObjectNav) is the task that an agent need navigate to an instance of a specific category in an unseen environment through visual observations within limited time steps. This work plays a significant role in enhancing the efficiency of locating specific items in indoor spaces and assisting individuals in completing various tasks, as well as providing support for people with disabilities. To achieve efficient ObjectNav in unfamiliar environments, global perception capabilities, understanding the regularities of space and semantics in the environment layout are significant. In this work, we propose an explicit-prediction method called VLAI that utilizes visual-language alignment information to guide the agent's exploration, unlike previous navigation methods based on frontier potential prediction or egocentric map completion, which only leverage visual observations to construct semantic maps, thus failing to help the agent develop a better global perception. Specifically, when predicting long-term goals, we retrieve previously saved visual observations to obtain visual information around the frontiers based on their position on the incrementally built incomplete semantic map. Then, we apply our designed Chat Describer to this visual information to obtain detailed frontier object descriptions. The Chat Describer, a novel automatic-questioning approach deployed in Visual-to-Language, is composed of Large Language Model(LLM) and the visual-to-language model(VLM), which has visual question-answering functionality. In addition, we also obtain the semantic similarity of target object and frontier object categories. Ultimately, by combining the semantic similarity and the boundary descriptions, the agent can predict the long-term goals more accurately. Our experiments on the Gibson and HM3D datasets reveal that our VLAI approach yields significantly better results compared to earlier methods. The code is released athttps://github.com/31539lab/VLAI.

Read full abstract

Target Object Research Articles

Related Topics

Articles published on Target Object

Alignable kernel network

Shape-Guided Detection: A joint network combining object detection and underwater image enhancement together

Comparative Analysis of Classification and Segmentation Performance of Different Dog Breeds Based on Mask R-CNN

Algae-based self-driven microrobot for efficient removal of nanoplastics from water environment

Semantic ghost imaging based on semantic coding

Visual search and real-image similarity: An empirical assessment through the lens of deep learning.

Label-Free Exosome Analysis by Surface-Enhanced Raman Scattering Spectroscopy with Laser-Ablated Silver Nanoparticle Substrate.

Predicting the higher heating value of products through solid yield in torrefaction process

3D Visual Grounding-Audio: 3D scene object detection based on audio

Global–Local Query-Support Cross-Attention for Few-Shot Semantic Segmentation

Stabilizing Dynamic Backscatter for Swift and Accurate Object Tracking

Individual differences in proprioceptive reorientation: a study on body characteristics and Posturography

Deep Learning-Based Biomimetic Identification Method for Mask Wearing Standardization.

Can 5 Minutes of Finger Actions Boost Creative Incubation?

Zero-shot visual grounding via coarse-to-fine representation learning

Research on a small target object detection method for aerial photography based on improved YOLOv7

A Robust Tri-Electromagnet-Based 6-DoF Pose Tracking System Using an Error-State Kalman Filter.

Is there a cost when predictions are not met? A VWP study investigating L1 and L2 speakers.

VLAI: Exploration and Exploitation based on Visual-Language Aligned Information for Robotic Object Goal Navigation

Depth range extended digital holography for precise 3D profile imaging via dual frequency interval sweeping

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Target Object Research Articles

Related Topics

Articles published on Target Object

Alignable kernel network

Shape-Guided Detection: A joint network combining object detection and underwater image enhancement together

Comparative Analysis of Classification and Segmentation Performance of Different Dog Breeds Based on Mask R-CNN

Algae-based self-driven microrobot for efficient removal of nanoplastics from water environment

Semantic ghost imaging based on semantic coding

Visual search and real-image similarity: An empirical assessment through the lens of deep learning.

Label-Free Exosome Analysis by Surface-Enhanced Raman Scattering Spectroscopy with Laser-Ablated Silver Nanoparticle Substrate.

Predicting the higher heating value of products through solid yield in torrefaction process

3D Visual Grounding-Audio: 3D scene object detection based on audio

Global–Local Query-Support Cross-Attention for Few-Shot Semantic Segmentation

Stabilizing Dynamic Backscatter for Swift and Accurate Object Tracking

Individual differences in proprioceptive reorientation: a study on body characteristics and Posturography

Deep Learning-Based Biomimetic Identification Method for Mask Wearing Standardization.

Can 5 Minutes of Finger Actions Boost Creative Incubation?

Zero-shot visual grounding via coarse-to-fine representation learning

Research on a small target object detection method for aerial photography based on improved YOLOv7

A Robust Tri-Electromagnet-Based 6-DoF Pose Tracking System Using an Error-State Kalman Filter.

Is there a cost when predictions are not met? A VWP study investigating L1 and L2 speakers.

VLAI: Exploration and Exploitation based on Visual-Language Aligned Information for Robotic Object Goal Navigation

Depth range extended digital holography for precise 3D profile imaging via dual frequency interval sweeping