Machine learning trial to detect sex differences in simple sticker arts of 1606 preschool children.

Keiko Matsubara,Yuko Ohgami,Koji Okamura,Saki Aoto,Maki Fukami,Yukiko Shimada

doi:10.23736/s2724-5276.21.06067-5

Abstract

Previous studies suggested that drawings made by preschool boys and girls show distinguishable differences. However, children's drawings on their own are too complexly determined and inherently ambiguous to be a reliable indicator. In the present study, we attempted to develop a machine learning algorithm for classification of sex of the subjects using children's artworks. We studied three types of simple sticker artworks from 1606 Japanese preschool children aged 51-83 months (803 boys and 803 girls). Those artworks were processed into digitalized data. Simulated data based on the original data were also generated. Logistic regression approach was applied to each dataset to make a classifier, and run on each dataset in a stratified ten-fold cross-validation with hyperparameter tuning. A probability score was calculated in each sample and utilized for sex classification. Prediction performance was evaluated using accuracy, recall, and precision scores, as well as learning curves. Two models created from the original and simulated data showed comparably low metrics. The distributions of probability scores in the samples from boys and girls mostly overlapped and were indistinguishable. Learning curves of the models showed an extremely under-fitted pattern. Our machine learning algorithm was unable to distinguish simple sticker arts created by boys and girls. More complex tasks will enable to develop an accurate classifier.

Full Text