Abstract
In recent times, numerous convolutional neural network (CNN) based detection models have been proposed and have shown excellent performance. However, because these models are generally developed to detect objects in class units (e.g., person, car), additional training processes with numerous datasets are required to find a specific object. This paper proposes a model that accurately detects specific persons by using top clothing color information without any additional training processes. The proposed method combines CNN-based instance segmentation and pose estimation, utilizing all the advantages of each technique. To avoid redundant computations, these two schemes are implemented as a filtering-based sequential operation structure. As a result, the proposed method has a 92.57% of accuracy in detecting a specific person with only a slight processing speed decrease. Furthermore, in this paper, the proposed model is efficiently ported on the heterogeneous embedded platform (i.e., NVIDIA Jetson AGX Xavier) with a parallel processing technique to maximize the hardware utilization.
Highlights
With the development of hardware accelerators like graphics processing units (GPUs), deep learning (DL) has become pwidely used in various computer vision (CV) tasks, such as image classification [1]–[3], object detection [4]–[7], segmentation [8]–[12], and pose estimation [13]–[17], and has shown remarkable performance
NETWORK STRUCTURE This paper proposes a mask-pose fusion model that combines the representative instance segmentation model, YOLACT++ [10], and the representative pose estimation model, AlphaPose [17], to identify a specific person in real time using the precise position of the upper body
EXPERIMENTAL ENVIRONMENTS To verify the performance of the proposed design, the accuracy and processing speed are evaluated on an RTX-2080 GPU with the COCO pre-trained weights of YOLACT++ and AlphaPose
Summary
With the development of hardware accelerators like graphics processing units (GPUs), deep learning (DL) has become pwidely used in various computer vision (CV) tasks, such as image classification [1]–[3], object detection [4]–[7], segmentation [8]–[12], and pose estimation [13]–[17], and has shown remarkable performance. Several prior studies have focused on DL-based facial recognition schemes [18], [22]–[24]; in practical environments, it may be necessary to identify specific persons in CCTV images. In such cases, there are significant limitations to performing accurate facial recognition, such as resolution and noise problems [25]. All these approaches can be used for specific person detection Their characteristics are as follows: YOLOv3 [5], a representative model of object detection, predicts classes using binary cross-entropy loss, and creates anchor boxes through clustering to detect bounding boxes. It is not suitable for use as an upper-body garment color discrimination model
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.