Learning Fair Graph Neural Networks with Limited and Private Sensitive Attribute Information

Enyan Dai,Suhang Wang

doi:10.1109/tkde.2022.3197554

Abstract

Graph neural networks (GNNs) have shown great power in modeling graph structured data. However, similar to other machine learning models, GNNs may make biased predictions w.r.t protected sensitive attributes, e.g., skin color and gender. This is because machine learning algorithms including GNNs are trained to reflect the distribution of the training data which often contains historical bias towards sensitive attributes. In addition, we empirically show that the discrimination in GNNs can be magnified by graph structures and the message-passing mechanism of GNNs. As a result, the applications of GNNs in high-stake domains such as crime rate prediction would be largely limited. Though extensive studies of fair classification have been conducted on independently and identically distributed (i.i.d) data, methods to address the problem of discrimination on non-i.i.d data are rather limited. Generally, learning fair models require abundant sensitive attributes to regularize the model. However, for many graphs such as social networks, users are reluctant to share sensitive attributes. Thus, only limited sensitive attributes are available for fair GNN training in practice. Moreover, directly collecting and applying the sensitive attributes in fair model training may cause privacy issues, because the sensitive information can be leaked in data breach or attacks on the trained model. Therefore, we study a novel and important problem of learning fair GNNs with limited number of private sensitive attributes, i.e., sensitive attributes that are processed with a privacy-preserving mechanism. In an attempt to address these problems, FairGNN is proposed to eliminate the bias of GNNs whilst maintaining high node classification accuracy by leveraging graph structures and limited sensitive information. To further preserve the privacy, private sensitive attributes with privacy guarantee are obtained by injecting noise based on local differential privacy. And We further extend FairGNN to NT-FairGNN to handle the limited and private sensitive attributes to simultaneously achieve fairness and preserve privacy. Theoretical analysis and extensive experiments on real-world datasets demonstrate the effectiveness of FairGNN and NT-FairGNN in achieving fair and high-accurate classification.

Full Text