Abstract

One direct application of explainable AI feature attribution methods is to be used for detecting unwanted biases. To do so, domain experts typically have to review explained inputs, checking for the presence of unwanted biases learnt by the model. However, the huge amount of samples the domain experts must review makes this task more challenging as the size of the dataset grows. In an ideal case, domain experts should be provided only with a small number of selected samples containing potential biases. The recently published Focus score seems a promising tool for the selection of samples containing potential unwanted biases. In this work, we conduct a first study in this direction, analyzing the behavior of the Focus score when applied to a biased model. First, we verified that Focus is indeed sensitive to an induced bias. This is assessed by forcing a spurious correlation, training a model using only cats-indoor and dogs-outdoor. We empirically prove that the model learnt to distinguish the contexts (outdoor vs indoor) instead of cat vs dog classes, so ensuring that the model learnt an unwanted bias. Afterwards, we apply the Focus on this biased model showing how the Focus score decreases when the input contains the aforementioned bias. This analysis sheds light on the Focus behavior when applied to a biased model, highlighting its strengths for its use for bias detection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.