A Comparative Study on Three Multi-Label Classification Tools

Sainan Pi,Shuo Xu,Jinghong Li,Xueli An

doi:10.1145/3416028.3416042

Abstract

Many science, technology and innovation (STI) resources are attached with several different labels, such as IPC and CPC for patents, and PACS (Physics and Astronomy Classification Scheme) numbers for scientific publications. This problem is well known as the multi-label classification. Though there are a number of approaches and open-source tools for this task in the literature that work well on benchmark datasets, real-world is more complex in terms of both the number and hierarchy of labels. This work aims to compare comprehensively the performance of three state-of-the-art tools, Dependency LDA, Scikit-Multilearn and Neural Classifier on Scigraph of academic resource data. It is found that Neural Classifier works better on an unbalanced distribution dataset with more complex hierarchical structure and a larger number of label scale in terms of Micro F1, Micro F1 and Hamming Loss than the other two tools. On the basis of our comparisons, several directions are suggested in the near future.

Full Text