Automatically Correcting Noisy Labels for Improving Quality of Training Set in Domain-specific Sentiment Classification

Thananchai Khamket,Jantima Polpinij

doi:10.55003/cast.2022.02.23.006

Abstract

Classification model performance can be degraded by label noise in the training set. The sentiment classification domain also struggles with this issue, whereby customer reviews can be mislabeled. Some customers give a rating score for a product or service that is inconsistent with the review content. If business owners are only interested in the overall rating picture that includes mislabeling, this can lead to erroneous business decisions. Therefore, this issue became the main challenge of this study. If we assume that customer reviews with noisy labels in the training data are validated and corrected before the learning process, then the training set can generate a predictive model that returns a better result for the sentiment analysis or classification process. Therefore, we proposed a mechanism, called polarity label analyzer, to improve the quality of a training set with noisy labels before the learning process. The proposed polarity label analyzer was used to assign the polarity class of each sentence in a customer review, and then polarity class of that customer review was concluded by voting. In our experiment, datasets were downloaded from TripAdvisor and two linguistic experts helped to assign the correct labels of customer reviews as the ground truth. Sentiment classifiers were developed using the k-NN, Logistic Regression, XGBoost, Linear SVM and CNN algorithms. After comparing the results of the sentiment classifiers without training set improvement and the results with training set improvement, our proposed method improved the average scores of F1 and accuracy by 20.59%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatically Correcting Noisy Labels for Improving Quality of Training Set in Domain-specific Sentiment Classification

Abstract

Talk to us

Similar Papers

More From: Current Applied Science and Technology

Lead the way for us

Similar Papers

Skin Lesion Segmentation in Dermoscopic Images with Noisy Data.
Norsang Lama ... William Van Stoecker
Journal of digital imaging | VOL. 36
Norsang Lama, et. al.Norsang Lama ... William Van Stoecker
05 Apr 2023
Journal of digital imaging | VOL. 36

Customer sentiment analysis and prediction of halal restaurants using machine learning approaches
Md Shamim Hossain ... Mst Farjana Rahman
Journal of Islamic Marketing | VOL. 14
Md Shamim Hossain, et. al.Md Shamim Hossain ... Mst Farjana Rahman
01 Jun 2022
Journal of Islamic Marketing | VOL. 14

A Progressive Deep Neural Network Training Method for Image Classification with Noisy Labels
Xuguo Yan ... Xuhui Xia
Applied Sciences | VOL. 12
Xuguo Yan, et. al.Xuguo Yan ... Xuhui Xia
12 Dec 2022
Applied Sciences | VOL. 12

Robust Class-Specific Autoencoder for Data Cleaning and Classification in the Presence of Label Noise
Weining Zhang ... Dong Wang
Neural Processing Letters | VOL. 50
Weining Zhang, et. al.Weining Zhang ... Dong Wang
14 Dec 2018
Neural Processing Letters | VOL. 50

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatically Correcting Noisy Labels for Improving Quality of Training Set in Domain-specific Sentiment Classification

Abstract

Talk to us

Similar Papers

More From: Current Applied Science and Technology