Utilizing data driven methods to identify gender bias in LinkedIn profiles

Vivian Simon,Neta Rabin,Hila Chalutz-Ben Gal

doi:10.1016/j.ipm.2023.103423

Abstract

The growing use of Artificial Intelligence-enabled recruitment systems has become an important component of modern talent recruitment, particularly through social networks such as LinkedIn and Facebook. However, data overflow embedded in recruitment systems, based on Natural Language Processing (NLP) methods, may result in unconscious gender bias. The purpose of this work is to utilize a set of methods to analyze and detect textual bias in different groups. We analyzed a training dataset of fourteen thousand LinkedIn profiles, which was provided by a company named Talenya, and included job-candidates that fit IT-related positions. We aimed to detect textual self-presentation gender gap patterns, utilizing Term Frequency - Inverse Document Frequency (TF-IDF), word2vec and the Universal Sentence Encoder (USE) to code the data, and applied the kernel two-sample test for the purpose of determining whether men’s and women’s LinkedIn profiles have the same distribution. Additionally, we focused on quantifying and identifying repetition in skills representation in the LinkedIn profile by applying the TF-IDF and cosine similarity tools and compared the repetitiveness pattern of men’s and women’s profiles. Gender-based analysis was also carried out on smaller, more homogeneous groups of candidates, who share the same position type, geographical location and organizational seniority level. Finally, we provide theoretical and practical implications.

Full Text