Abstract

We have identified a need in visual relationship detection and biometrics related research for a dataset and model which focuses on person-clothing pairs. Previous to our Relatable Clothing dataset, there were no publicly available datasets usable for “worn” and “unworn” clothing detection. In this paper we propose a novel visual relationship model architecture for “worn” and “unworn” clothing detection that makes use of a soft attention mechanism for feature fusion between a conventional ResNet backbone and our novel person-clothing mask feature extraction architecture. The best proposed model achieves 98.62% accuracy, 99.50% precision, 98.31% recall, and 99.14% specificity on the Relatable Clothing dataset, outperforming our previous iterations. We release our models which can be found on the Relatable Clothing GitHub repository (https://github.com/th-truong/relatable_clothing) for future research and applications into detecting and analyzing person-clothing pairs.

Highlights

  • S TATE-of-the-art computer vision models for object detection has been rapidly progressing

  • Subsections IV-B to IV-F cover the hyperparameter search and ablation study of various modules of the proposed visual relationship detection network. These studies use the ResNet50 with Feature Pyramid Network (FPN) backbone for all experiments for its popularity in literature, relatively high object detection performance, and small model size relative to other backbone networks used in visual relationship detection

  • A mAP0.5:0.95 of around 30% is as expected for these models when compared to training on Common Objects in Context (COCO), as provided by PyTorch [5]

Read more

Summary

Introduction

S TATE-of-the-art computer vision models for object detection has been rapidly progressing. Instance segmentation, semantic segmentation, and visual relationship detectors have all seen outstanding performance on public datasets. The performance of these models provides interesting applications in the safety and security industry. Bounding box and instance segmentation detection is concerned with classifying regions in a provided image to be of a certain class of object. Bounding boxes provide a box as defined by pixel coordinates that fully contain the object class. Instance segmentation takes the object detection problem one step further and attempts to provide pixellevel annotations for objects of interest. Semantic segmentation involves providing pixel-level annotations for every pixel in a provided image

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call