A New Human Factor Study in Developing Practical Vision-Based Applications with the Transformer-Based Deep Learning Model

Thitirat Siriborvornratanakul

doi:10.1007/978-3-031-05643-7_28

Abstract

AbstractThe convolutional neural network is a deep learning architecture that has dominated most computer vision tasks for several years. But starting from 2020, Transformer architecture has turned to be a new challenger that has been expected to replace convolutional neural networks in the near future. Unlike researchers that prefer observing any new possibility in order to look for chances of improvement, achieving a new state-of-the-art model is not a goal for most practitioners. This paper observes in detail how the two types of architectures allow practitioners to easily use them in actual applications. Major models regarding each architecture in each computer vision task are described and summarized according to their task variety, availability, outputted performances, and computational resources. In conclusion, this paper discovers that the younger Transformer-based models are not inferior in terms of task variety, outputted performance, and computational resources. But it is the problem of availability that makes Transformer-based models more difficult to use at this moment.KeywordsHuman factorSimplicityArtificial intelligenceDeep learningConvolutional neural networkTransformer

Full Text