Classifying Complaint Reports Using RNN and Handling Imbalanced Dataset

Oktefvia Aruda Lisjana,Masayu Leylia Khodra

doi:10.1109/icitacee55701.2022.9923998

Abstract

A complaint report contains citizen's complaint about problems in their area. The text of this report needs to be classified into specific categories to make it easier for the government to follow up on the information. This study uses a dataset from Jakarta Smart City (JSC), namely Cepat Respon Masyarakat. Currently, JSC still uses annotators to classify report texts manually. This of course takes quite a long time. So, automatic classification needs to be done. We use two Recurrent Neural Network (RNN) methods: Bidirectional Long Short-Term Memory (Bi-LSTM) and Gated Recurrent Unit (GRU). Furthermore, Word2Vec and FastTexts are used for word embedding. Another problem with the category classification is the existence of an imbalanced dataset, so it is necessary to carry out special handling using Synthetic Minority Over-Sampling Technique (SMOTE) and Class Weight. The best model was obtained from the classification experiments through a combination of FastText, GRU method, and SMOTE with an evaluation results accuracy of 0.78 and f1-score macro of 0.52.

Full Text