Experience Simple Transformer library in solving Mojaz Multi-Topic Labelling Task

Moataz Ajlouni

doi:10.1109/icics52457.2021.9464602

Abstract

This article describes the code that was used in a multi-topic labeling system, the code starts with loading the train, validation, and test data set then installing the pyarabic and Simple Transformers libraries." pyarabic" allows the system to manipulate Arabic letters. "Simple Transformers" is a Natural Language Processing (NLP) library designed to simplify the usage of Transformer models without having to compromise on utility. The model was used from the Simple Transformer library is "Multi Label Classification Model" with the model type "bert" and the model's name "asafaya/bert-base-arabic". In multi-label text classification, the target for a single article (row) from the training dataset is a list of 10 distinct binary labels. A transformer-based multi-label text classification model typically consists of a transformer model with a classification layer on top of it. The results were impressive compared to the short training time for two epochs, which is five minutes and 27 seconds. The accuracy results were as follows: F1 macro: 0.866, F1 micro: 0.869, competition website on Codalab: 0.8468.

Full Text