Abstract

Automated text classification is the task of grouping documents (text) automatically into categories from a predefined set. The conventional approach to classification involves mapping a single class label each to a data point (instance). In multi-label classification (MLC), the task is to develop models that could predict multiple class labels to a data instance. There exist several MLC methods such as classifier chain (CC) and binary relevance (BR). However, there are drawbacks with these methods such as random label sequence ordering issue. This study attempts to address this issue peculiar with the classifier chain method. In this paper, a hybrid heuristic evolutionary-based technique is proposed. The proposed PSOGCC is a combination of particle swarm optimization (PSO) and genetic algorithm (GA). Genetic operators of GA are integrated with the basic PSO algorithm for finding the global best solution representing an optimized label sequence order in the chain classifier. In the experiment, three MLC methods: BR, CC, and PSOGCC are implemented using five benchmark multi-label datasets and five standard evaluation metrics. The proposed PSOGCC method improved the predictive performance of the chain classifier by obtaining the best results of 98.66%, 99.5%, 99.16%, 99.33%, 0.0011 accuracy, precision, recall, f1 Score, and Hammingloss values, respectively.

Highlights

  • Automated text classification (ATC) is the task of developing predictive models capable of categorizing text documents into distinct class labels from a predefined set

  • Unlike the classical single-label classification (SLC) technique, where an instance of a data sample is associated with a single class label, multi-label classification (MLC) [4]–[6] involves the problem of assigning to a data point multiple class labels simultaneously

  • The existing MLC techniques could be broadly categorized into two approaches [6]: problem transformation and algorithm adaptation

Read more

Summary

INTRODUCTION

Automated text classification (ATC) is the task of developing predictive models capable of categorizing text documents into distinct class labels from a predefined set. In problem transformation (PT) approach, the strategy involves transforming a multi-label problem into multiple single-label problems and learn one of the SLC algorithms (or classifiers) such as decision trees, for modeling the membership class (label). Algorithm adaptation (AA) approach is based on inducing a conventional machine learning classification algorithm (singlelabel classifier) for multi-label problem. In AA strategy, a learning algorithm (classifier) such as support vector machine (SVM) is modeled and directly applied on MLC problems. This approach to MLC has been less applied by researchers due to its limitations such as lack of flexibility, complexity [8]. We proposed an improved multi-label classifier chain method based on hybrid heuristic evolutionary techniques.

RELATED WORKS
PSO Algorithm
METHODOLOGIES
EXPERIMENTS AND RESULTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call