Abstract

Many science, technology and innovation (STI) resources are attached with several different labels, such as IPC and CPC for patents, and PACS (Physics and Astronomy Classification Scheme) numbers for scientific publications. This problem is well known as the multi-label classification. Though there are a number of approaches and open-source tools for this task in the literature that work well on benchmark datasets, real-world is more complex in terms of both the number and hierarchy of labels. This work aims to compare comprehensively the performance of three state-of-the-art tools, Dependency LDA, Scikit-Multilearn and Neural Classifier on Scigraph of academic resource data. It is found that Neural Classifier works better on an unbalanced distribution dataset with more complex hierarchical structure and a larger number of label scale in terms of Micro F1, Micro F1 and Hamming Loss than the other two tools. On the basis of our comparisons, several directions are suggested in the near future.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call