Classifying Gender Based on Face Images Using Vision Transformer

Ganjar Gingin Tahyudin,Ema Rachmawati,Muhammad Arzaki,Mahmud Dwi Sulistiyo

doi:10.62527/joiv.8.1.1923

Ganjar Gingin Tahyudin, Ema Rachmawati + Show 2 more

Open Access

https://doi.org/10.62527/joiv.8.1.1923

Copy DOI

Abstract

Due to various factors that cause visual alterations in the collected facial images, gender classification based on image processing continues to be a performance challenge for classifier models. The Vision Transformer model is used in this study to suggest a technique for identifying a person’s gender from their face images. This study investigates how well a facial image-based model can distinguish between male and female genders. It also investigates the rarely discussed performance on the variation and complexity of data caused by differences in racial and age groups. We trained on the AFAD dataset and then carried out same-dataset and cross-dataset evaluations, the latter of which considers the UTKFace dataset. From the experiments and analysis in the same-dataset evaluation, the highest validation accuracy of happens for the image of size pixels with eight patches. In comparison, the highest testing accuracy of occurs for the image of size pixels with patches. Moreover, the experiments and analysis in the cross-dataset evaluation show that the model works optimally for the image size pixels with patches, with the value of the model’s accuracy, precision, recall, and F1-score being , , , and , respectively. Furthermore, the misclassification analysis shows that the model works optimally in classifying the gender of people between 21-70 years old. The findings of this study can serve as a baseline for conducting further analysis on the effectiveness of gender classifier models considering various physical factors.

Full Text