Training Data Improvement by Automatic Generation of Semantic Networks for Bias Mitigation

Roman Englert,J&ouml;rg Muschiol

doi:10.11648/j.ajist.20220601.11

Roman Englert, Jörg Muschiol

Open Access

https://doi.org/10.11648/j.ajist.20220601.11

Copy DOI

Abstract

The significance of Bias Detection has increased appreciably, due to the increased application of AI. Although syntactic bias is well explored with statistical techniques, there remains semantic bias challenge like for example, Google’s face recognition which excludes colored people. Human expertise is required to detect semantic bias, e.g., for the application of the root-out-bias method. We propose a further automatization to this laborious method, based on the Training Data Improvement for Bias Mitigation (TDIBM). The concept, is to automatically construct a Semantic Network (SN) from the domain description of the training. For the semantic network nouns are extracted. As a second step, synonyms and semantically similar nouns are searched, e.g. in dictionaries, and added to the SNs. As a result, the SN contains nouns that enhances the given domain, with previously unknown knowledge. This SN can be used to check with, e.g., the root-out bias method, whether the training sample is biased, or not. Should the training sample be biased, then the corresponding nouns from the SN can be added to the training sample set to mitigate the bias. The newly developed method, TDIBM is evaluated twofold: Firstly, with the description of the COMPAS system, which is a case management and decision support tool used by U.S. courts to assess the likelihood of a defendant becoming a recidivist. Secondly, an autonomous driving domain is applied, to investigate accidental driving of a Tesla car. Here TDIBM detected among many new features, including one to solve ambiguous scene interpretations for autonomous driving vehicles.

Full Text