Learning to rank for multi-label text classification: Combining different sources of information

Hosein Azarbonyad,Maarten Marx,Jaap Kamps,Mostafa Dehghani

doi:10.1017/s1351324920000029

Hosein Azarbonyad, Maarten Marx + Show 2 more

Open Access

https://doi.org/10.1017/s1351324920000029

Copy DOI

Journal: Natural Language Engineering	Publication Date: Feb 18, 2020
Citations: 23	License type: CC BY 4.0

Affiliation: University of Amsterdam

Abstract

AbstractEfficiently exploiting all sources of information such as labeled instances, classes’ representation, and relations of them has a high impact on the performance of Multi-Label Text Classification (MLTC) systems. Most of the current approaches use labeled documents as the primary source of information for MLTC. We investigate the effectiveness of different sources of information— such as the labeled training data, textual labels of classes, and taxonomy relations of classes— for MLTC. More specifically, first, for each document–class pair, different features are extracted using different sources of information. The features reflect the similarity of classes and documents. Then, MLTC is considered to be a ranking problem, and a learning to rank (LTR) approach is used for ranking classes regarding documents and selecting labels of documents. An important characteristic of many MLTC instances is that documents can belong to multiple classes and there are implicit relations between classes. We apply score propagation on top of LTR to incorporate co-occurrence patterns of classes in labeled documents. Our main findings are the following. First, using an LTR approach integrating all features, we observe significantly better performance than previous systems for MLTC. Specifically, we show that simple classification approaches fail when there is a high number of classes. Second, the analysis of feature weights reveals the relative importance of various sources of evidence, also giving insight into the underlying classification problem. Interestingly, the results indicate that the titles of documents are more informative than all other sources of information. Third, a lean-and-mean system using only four features is able to perform at 96% of the large LTR model that we propose in this paper. Fourth, using the co-occurrence information of classes helps in classifying documents more accurately. Our results show that the co-occurrence information is more helpful when the underlying classifier has a poor performance.

Highlights

Multi-Label Text Classification (MLTC) is a supervised machine learning task in which the goal is to learn a classifier that assigns multiple labels to text documents (Herrera et al 2016)
6.1 Effectiveness of Learning to rank (LTR) integrating a variety of sources of information we evaluate the effectiveness of the LTR approach integrating a variety of sources of information for MLTC and look at the importance of the different features
The LTR method significantly outperforms Support Vector Machines (SVM), BM25-TITLES, and JEX, demonstrating that the additional sources of information employed in LTR are effective for the MLTC task

Summary

Introduction

Multi-Label Text Classification (MLTC) is a supervised machine learning task in which the goal is to learn a classifier that assigns multiple labels to text documents (Herrera et al 2016). Learning to rank (LTR) has been shown to be an effective approach for MLTC In this approach, a model is trained to rank classes regarding the documents and select the topk classes as labels of documents. Rather than creating and optimizing a separate model for each class and predicting the probability of assigning each class to the given document, the learning objective of LTR approach for MLTC is to create a global ranking model that ranks all classes for a given document

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning to rank for multi-label text classification: Combining different sources of information

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Natural Language Engineering

Lead the way for us

Similar Papers

When graph convolution meets double attention: online privacy disclosure detection with multi-label text classification
Zhanbo Liang ... Zheng Huang
Data Mining and Knowledge Discovery | VOL. 38
Zhanbo Liang, et. al.Zhanbo Liang ... Zheng Huang
05 Jan 2024
Data Mining and Knowledge Discovery | VOL. 38

Accuracy of breeding values in small genotyped populations using different sources of external information—A simulation study
S Andonov ... I Misztal
Journal of Dairy Science | VOL. 100
S Andonov, et. al.S Andonov ... I Misztal
27 Oct 2016
Journal of Dairy Science | VOL. 100

Rational prescribing and sources of information
Flora Haayer
Social Science & Medicine | VOL. 16
Flora HaayerFlora Haayer
01 Jan 1981
Social Science & Medicine | VOL. 16

Coping With the Infodemic With Scientific Knowledge Management
Jorge Biolchini ... Tatiana Figueiredo
-
Jorge Biolchini, et. al.Jorge Biolchini ... Tatiana Figueiredo
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning to rank for multi-label text classification: Combining different sources of information

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Natural Language Engineering