Abstract

Extreme multi-label classification predicts labels that instances can have for a dataset with a massive number of labels. State-of-the-art approaches consider it more critical to accurately predict related labels than non-relevant labels, so they evaluate only the top k label candidates. However, the top k results are not suitable for applications, such as a drug repositioning, which aims to assign additional labels to instances. In this work, we propose DilXML, an extreme multi-label classifier that suggests a reasonable number of labels for each instance. DilXML overcomes the absence of negative data and the poverty of instances by decoupling instances and labels. Also, we propose three criteria for conceptual distance formulas considering a hierarchical structure between features. Through this, a skew coordinate feature space better reflects the relatedness between points. DilXML is the first extreme multi-label classification that conducts example-based evaluations. We compare over five state-of-the-art approaches: AnnexML, Bonsai, DiSMEC, FastXML, and ProXML. DilXML is the only one that achieves the best performance for all metrics and outperforms by 10% except for one data. For the targeted medical data, DilXML is 58% better on all four evaluation metrics than other methods. Besides, we conduct a literature review on drug repositioning candidates and confirm that newly obtained labels are significantly related to the instance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.