Abstract

With the frequent interaction and cooperation between different disciplines in recent years, the number of research papers associated with multiple subjects increased. Correspondingly, some of the existing literatures belong to a single discipline, while others may simultaneously involve more than 2 subjects. At this time, the traditional single-label text classification is not conducive to people obtaining comprehensive and cutting-edge research papers in real life. Thus, it’s of great importance to conduct a multi-label classification of research papers effectively. This paper tests the performance of multi-label learning tasks with text data obtained from the Kaggle website. Firstly, lemmatization and Term Frequency-Inverse Document Frequency (TF-IDF) are used for feature extraction in the pre-processing part. The critical information of text content is statistically analysed, and text content is converted into numerical and high-dimensional vector space. As the traditional single-label classification algorithm is not suitable for the above problem, this paper adopts the Multi-Label K-Nearest Neighbour (ML-KNN) algorithm framework for classification. Experimental results report that the ML-KNN algorithm has achieved better results in multi-label text classification problems than a traditional multi-label algorithm, which proves the effectiveness of the ML-KNN algorithm for text data prediction with multiple subjects. Moreover, the work in this paper is analysed and summarized.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.