Joint Learning for Multitasking Models

Ajai John Chemmanam,Bijoy A Jose

doi:10.1007/978-981-19-4453-6_11

Abstract

AbstractArtificial intelligence using neural networks has made tremendous progress in the field of computer vision. State-of-the-art models have been developed for various computer vision tasks such as image classification, object detection, image segmentation and keypoint estimation. Many of these models are tuned to get highest benchmark scores on curated datasets for any single specific task. However, deploying them to industrial use cases often requires multiple models to be used sequentially or as an ensemble to address different business use cases. Better hardware, model optimisation techniques such as quantisation and pruning have been widely used to improve the performance of individual models. In most cases, the features extracted by a model is not effectively used by the subsequent models. Each model will have its own pre- and post-processing which is a considerable overhead when scaled to industry requirements. We explore the concept of multitasking architectures and propose a joint learning approach to train a multitasking model that can do object detection, keypoint estimation and instance segmentation together using a single forward pass through it. Learning to predict multiple closely related tasks should help the model to learn better representations of the trained data and become robust to overfitting. Our best performing model achieved 32.26 frames per second (fps) with 41.2 AP on object detection, 38.2 AP on instance segmentation and 53.0 AP on keypoint estimation tasks when evaluated on COCO validation dataset. A lighter version of the model was able to process at 41.66 fps, enabling real-time computations for most use cases.KeywordsJoint learningMultitaskingObject detectionInstance segmentationKeypoint estimation

Full Text