A robot that picks and places the wide variety of items in a logistics warehouse must detect and recognize items from images and then decide which points to grasp. Our Multi-task Deconvolutional Single Shot Detector (MT-DSSD) simultaneously performs the three tasks necessary for this manipulation: object detection, semantic segmentation, and grasping detection. MT-DSSD is a multi-task learning (MTL) method based on DSSD that reduces the amount of computation and achieves high speed compared to when separate models perform each task. Evaluations using the Amazon Robotics Challenge dataset showed that our model has a better object detection and segmentation performance than comparable methods, and an ablation study showed that MTL could improve the accuracy of each task. Further, robotic experiments for grasping demonstrated that our model could detect the appropriate grasping point.
Read full abstract