A SHAP-based Active Learning Approach for Creating High-Quality Training Data

Nailcan Kara,Gokce Ezeroglu,Omer Burak Akgun,Serdar Mola,Yagiz Levent Gume,Arzucan Ozgur,Umit Tigrak

doi:10.1109/bigdata55660.2022.10020327

Abstract

Machine learning-based text classification models require labeled data for training. However, manual labeling is a costly and time-consuming process. This task is particularly difficult in domains such as banking, where outsourcing data labeling is generally not allowed due to privacy laws. We propose a novel active learning-based approach in which the most difficult instances in the pool of unlabeled data are selected based on the Shapley Additive Explanations (SHAP) values of the words in the texts to be classified and passed to human annotators for labeling. At each iteration of this human-in-the-loop strategy, newly labeled instances are added to the training set. We demonstrate the effectiveness of this approach in classifying customer comments in the banking domain surveys. Our experiments indicate that better results are achieved when the proposed approach is used to expand the training set, compared to a baseline strategy of expanding the training set with randomly selected instances. Further analysis shows that the difference in performance between the two approaches becomes more pronounced as class imbalance increases. This study suggests that human-in-the-loop based active learning is a powerful strategy for creating high-quality training datasets by effectively leveraging human annotation effort.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A SHAP-based Active Learning Approach for Creating High-Quality Training Data

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Using Active Learning to Improve Distantly Supervised Entity Typing in Multi-source Knowledge Bases
Bo Xu ... Xiangsan Zhao
-
Bo Xu, et. al.Bo Xu ... Xiangsan Zhao
01 Jan 2020
01 Jan 2020

Dirichlet Process Based Active Learning and Discovery of Unknown Classes for Hyperspectral Image Classification
Hao Wu ... Saurabh Prasad
IEEE Transactions on Geoscience and Remote Sensing | VOL. 54
Hao Wu, et. al.Hao Wu ... Saurabh Prasad
01 Aug 2016
IEEE Transactions on Geoscience and Remote Sensing | VOL. 54

A machine learning-based prediction model pre-operatively for functional recovery after 1-year of hip fracture surgery in older people.
Chun Lin ... Zhen Liang
Frontiers in surgery | VOL. 10
Chun Lin, et. al.Chun Lin ... Zhen Liang
07 Jun 2023
Frontiers in surgery | VOL. 10

Machine learning for prediction of delirium in patients with extensive burns after surgery.
Yujie Ren ... Xing Cheng
CNS Neuroscience & Therapeutics | VOL. 29
Yujie Ren, et. al.Yujie Ren ... Xing Cheng
30 Apr 2023
CNS Neuroscience & Therapeutics | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A SHAP-based Active Learning Approach for Creating High-Quality Training Data

Abstract

Talk to us

Similar Papers