Abstract

The adoption of deep learning models in industries for computer vision tasks is often hindered by the need to collect an adequate number of images to build a dataset sufficiently representative of the problem to be addressed. The common approach for handling this issue involves starting from a pre-trained model and finetuning it on a domain-specific dataset, drastically reducing the required number of images and, at the same time, delivering acceptable performances. Recent advances in neural network architectures such as transformers and self-supervised training techniques have contributed to the development of a particular type of deep learning model, namely, foundational models, that can be easily adapted to a wide range of downstream tasks. In this work, the performances of two foundational models were compared in a real industrial use case requiring the detection and classification of welding defects for small components. The application of these models as the backbone for a single classification linear layer trained on a small, curated dataset of 80 images, achieved an accuracy of around 95% on the test and validation sets with little implementation effort.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call