Abstract

Many online services, such as search engines, social media platforms, and digital marketplaces, are advertised as being available to any user, regardless of their age, gender, or other demographic factors. However, there are growing concerns that these services may systematically underserve some groups of users. In this paper, we present a framework for internally auditing such services for differences in user satisfaction across demographic groups, using search engines as a case study. We first explain the pitfalls of na\"ively comparing the behavioral metrics that are commonly used to evaluate search engines. We then propose three methods for measuring latent differences in user satisfaction from observed differences in evaluation metrics. To develop these methods, we drew on ideas from the causal inference literature and the multilevel modeling literature. Our framework is broadly applicable to other online services, and provides general insight into interpreting their evaluation metrics.

Highlights

  • Modern search engines are complex, relying heavily on machine learning methods to optimize search results for user satisfaction

  • Search engines are often evaluated using metrics based on behavioral signals, several studies have suggested that these metrics are sensitive to a variety of factors: Hassan and White [26] demonstrated that evaluation metric values vary dramatically by user; Carterette et al [10] made a similar observation and incorporated user variability into evaluation metrics; and Borisov et al studied the degree to which metrics are sensitive to a user’s search context [8]

  • Auditing search engines for equal access is much more complicated than comparing evaluation metrics for demographically binned search impressions. We addressed this challenge by proposing three methods for measuring latent differences in user satisfaction from observed differences in evaluation metrics

Read more

Summary

INTRODUCTION

Modern search engines are complex, relying heavily on machine learning methods to optimize search results for user satisfaction. One way to assess whether a search engine provides equal access is to look for differences in user satisfaction across demographic groups. Considering the average value of the metric across all users will underemphasize the effectiveness of the search engine on retirement planning queries. Context matching, controls for two confounding contextual differences: the query itself and the intent of the user (section 5). Because this method attempts to match users’ search contexts as closely as possible, it can only be applied to a restricted set of queries. Our second method is a multilevel model for the effect of query difficulty on evaluation metrics (section 6) This method controls for fewer confounding factors, but is more generalizable. For comparison, we used our third method to conduct an external audit of a leading competitor to Bing using publicly available data from comScore (section 8)

Fairness in Machine Learning
Demographics and Web Search
User Satisfaction in Web Search
DATA AND METRICS
Differences in Queries
Differences in Evaluation Metrics
CONTEXT MATCHING
MULTILEVEL MODELING
ESTIMATING DIFFERENCES
EXTERNAL AUDITING
DISCUSSION
Findings
10. REFERENCES
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call