Abstract

Automated face mask classification has surfaced recently following the COVID-19 mask wearing regulations. The current State-of-The-Art of this problem uses CNN-based methods such as ResNet. However, attention-based models such as Transformers emerged as one of the alternatives to the status quo. We explored the Transformer-based model on the face mask classification task using three models: Vision Transformer (ViT), Swin Transformer, and MobileViT. Each model is evaluated with a top-1 accuracy score of 0.9996, 0.9983, and 0.9969, respectively. We concluded that the Transformer-based model has the potential to be explored further. We recommended that the research community and industry explore its integration implementation with CCTV.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call