Abstract

Anti-discrimination law in many jurisdictions effectively bans the use of race and gender in automated decision-making. For example, this law means that insurance companies should not explicitly ask about legally protected attributes, e.g., race, in order to tailor their premiums to particular customers. In legal terms, indirect discrimination occurs when a generally neutral rule or variable is used, but significantly negatively affects one demographic group. An emerging example of this concern is inclusion of proxy variables in Machine Learning (ML) models, where neutral variables are predictive of protected attributes. For example, postcodes or zip codes are representative of communities, and therefore racial demographics and social-economic class; i.e., a traditional example of ‘redlining’ pre-dating modern automated techniques [1]. The law struggles with proxy variables in machine learning: indirect discrimination cases are difficult to bring to court, particularly because finding substantial evidence that shows the indirect discrimination to be unlawful is difficult [2]. With more complex machine-learning models being developed for automated decision making, e.g., random forests or state-of-the-art deep neural networks, more data points on customers are accumulated [1], from a wide variety of sources. With such rich data, ML models can produce multiple interconnected correlations - such as that found in single neurons in a neural network, or single decision trees in a random forest - which are predictive of protected attributes, akin to traditional uses of discrete proxy variables. In this poster, we introduce the concept of "emerging proxies", that are a combination of several variables, from which the ML model could infer the protected attribute(s) of the individuals in the dataset. This concept differs from the traditional concept of proxies because rather than addressing a single proxy variable, a distribution of interconnected proxies would have to be addressed. Our contribution is to provide evidence for the capacity of complex ML models to identify protected attributes through the correlation of other variables. This correlation is not made explicitly through a discrete one to one relationship between variables, but through a many-to-one relationship. This contribution complements concerns raised in legal analyses of automated decision-making about proxies in ML models leading to indirect discrimination [3]. Our contribution shows that if an ML model contains “emerging proxies” for a protected attribute, the distribution of proxies will be a roadblock when attempting to de-bias the model, limiting the pathways available for addressing potential discrimination caused by the ML model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call