Deep Learning for Voice Gender Identification: Proof-of-concept for Gender-Affirming Voice Care.

Frank Rudzicz,Michael Johns,Jeremy Pinto,Yael Bensoussan,Matthew Crowson,Patrick R. Walden

doi:10.1002/lary.29281

Abstract

The need for gender-affirming voice care has been increasing in the transgender population in the last decade. Currently, objective treatment outcome measurements are lacking to assess the success of these interventions. This study uses neural network models to predict binary gender from short audio samples of "male" and "female" voices. This preliminary work is a proof-of-concept for further work to develop an AI-assisted treatment outcome measure for gender-affirming voice care. Retrospective cohort study. Two hundred seventy-eight voices from male and female speakers from the Perceptual Voice Qualities Database were used to train a deep neural network to classify voices as male or female. Each audio sample was mapped to the frequency domain using Mel spectrograms. To optimize model performance, we performed 10-fold cross validation of the entire dataset. The dataset was split into 80% training, 10% validation, and 10% test. Overall accuracy of 92% was obtained, both when considering the accuracy per spectrum and per patient metric. The accuracy of the model was higher for recognizing female voices (F1 score of 0.94) compared to male voices (F1 score of 0.87). This proof of concept study shows promising performance for further development of an AI-assisted tool to provide objective treatment outcome measurements for gender affirming voice care. 3 Laryngoscope, 131:E1611-E1615, 2021.

Full Text