Abstract
The past few years have witnessed an explosion of attention given to the bias displayed by Machine Learning (ML) techniques towards different groups of people (e.g., female vs. male). Although ML techniques have been widely adopted in education, it remains largely unexplored that to what extent such ML bias manifests itself in this specific setting and how it can be reduced and eliminated. Given the increasing importance of ML techniques in empowering educators to teach effectively, this study aimed to quantify the characteristics of the original datasets that might be correlated with the subsequent predictive unfairness displayed by ML models. To this end, we empirically investigated two types of data biases (i.e., distribution bias and hardness bias) towards students of different sexes and first-language backgrounds across a total of five frequently-performed predictive tasks in education. Then, to improve ML fairness, we drew inspiration from the well-established research in Class Balancing Techniques (CBTs), where samples are generated/removed to alleviate the predictive disparity between different prediction classes. We proposed two simple but effective strategies to empower class balancing techniques for alleviating data biases and improving prediction fairness. Through extensive analyses and evaluations, we demonstrated that ML models may greatly improve prediction fairness (improvement up to 66%) with only a small sacrifice (less than 1%) in prediction accuracy by balancing the training data with the use of students’ demographic information and the overall hardness bias measure. All data and code used in this study are publicly accessible via https://github.com/lsha49/FairEdu.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.