Single vs. Multi-Label: The Issues, Challenges and Insights of Contemporary Classification Schemes

Mohammed Salih Ahmed,Mohammed Imran Basheer Ahmed,Naseer Ahmed Sajid,Asiya Abdus Salam,Munir Ahmad,Dhiaa Musleh,Dania Alkhulaifi,Atta Rahman,Reem Alassaf,Sghaier Chabani

doi:10.3390/app13116804

Abstract

Over the decades, a tremendous increase has been witnessed in the production of documents available in digital form. The increased production of documents has gained so much momentum that their rate of production jumps two-fold every five years. These articles are searched over the internet via search engines, digital libraries, and citation indexes. However, the retrieval of relevant research papers for user queries is still a pipedream. This is because scientific documents are not indexed based on some subject classification hierarchies. Hence, the classification of these documents becomes a challenging task for the researchers. Classification of the documents can be two-fold: one way is to assign a single label to each document and the other is to assign multi-labels to each document based on its belonging domains. Classification of the documents can be performed by using either the available metadata or the whole content of the documents. While performing classification, there are many challenges which may belong to the dataset, feature selection technique, preprocessing methodology, and which classification model is suitable for the classification of the documents. This paper highlights the issues for single-label and multi-label classification by using either metadata or content of the documents and why metadata-based approaches are better than content-based approaches in terms of feasibility.

Full Text