Abstract

Within the lack of accurate data, for some computer vision applications, researchers usually use other pictures collected from different sources for the training. To know the effect of these added data, we compare the detection results of a customized dataset of objects, using the same detection model, while changing the training data fed into the network. For our work, we run the detection on images captured by the Microsoft Kinect sensor after training the network on different combinations of training data. The first part of the training data is captured by the Kinect itself, and the second is collected from several sources from the internet, referred to as collected images. We then change the distribution of these images between training and validation to feed them into the fixed training model. The results prove that this distribution of data can considerably affect training and detection results under the same model parameters. In addition, mixing the captured images with other collected ones can improve these results.

Highlights

  • The recent decade has witnessed a dramatic increase in the ability to classify, localize and detect objects in images

  • The authors tried to apply the meta-architectures of object detections, namely: SSD (Single Shot Multibox Detector) [26], Faster Region-based Convolutional Neural Network (R-CNN), and R-FCN (Region-based Fully Convolutional Networks) [27], to conduct their experiments and comparisons

  • The result of running the detection after training on images purely captured by the Kinect is shown in figure 3

Read more

Summary

Introduction

The recent decade has witnessed a dramatic increase in the ability to classify, localize and detect objects in images. This success is the result of the advent of powerful General Purpose Unites (GPUs,) and designing deep structures of convolutional neural networks, in addition to the availability of large datasets. We focus on the detection problem, where the model has to decide what objects are in the image, as well as where do they appear. We first train and validate the network on images captured by the Kinect. We replace the training set with collected images and move some of the previously captured images to the validation. Different number of images is used to keep a balance among instances of each class in the three experiments

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call