Much attention has been on the behaviors of computer vision services when describing images of people. Audits have revealed rampant biases that could lead to harm, when services are used by developers and researchers. We focus on temporal auditing, replicating experiments originally conducted three years ago. We document the changes observed over time, relating this to the growing awareness of structural oppression and the need to align technology with social values. While we document some positive changes in the services’ behaviors, such as increased accuracy in the use of gender-related tags overall, we also replicate findings concerning larger error rates for images of Black individuals. In addition, we find cases of increased use of inferential tags (e.g., emotions), which are often sensitive. The analysis underscores the difficulty in following changes in services’ behaviors over time, and the need for more oversight of such services.
Read full abstract