An ensemble topic extraction approach based on optimization clusters using hybrid multi-verse optimizer for scientific publications

Ammar Kamal Abasi,Syibrah Naim,Zaid Abdi Alkareem Alyasseri,Sharif Naser Makhadmeh,Mohammed Azmi Al-Betar,Ahamad Tajudin Khader

doi:10.1007/s12652-020-02439-4

Abstract

For text document clustering (TDC), a novel hybrid of the multi-verse optimizer (MVO) algorithm and k-means (also called H-MVO) are proposed in this work. Moreover, a new ensemble method for an automatic topic extraction (TE) has been proposed in this paper, from a set of scientific publications in the form of text documents with the purpose of extracting topics from clustered documents. Often, the existing TE methods draw upon the statistical theory. However, the results might be different when the same clustered document is utilized. Consequently, there can be imprecise results, which are related to the extracted topics from the clustered documents owing to the behavior of the TE methods. As a result, the vigorous characteristics of the TE methods are ensembled, thereby empowering the accuracy of the extracted topics. The results, which were yielded by H-MVO for TDC, were compared against 14 well-regarded methods, involving five clustering methods, in addition to seven metaheuristic algorithms, as well as two hybrid optimization algorithms. Also, the results, which were generated by the introduced ensembled TE method, were compared against those, which were produced by five established statistical methods in the literature. As a result, the findings revealed that the suggested ensembled TE method outperformed the entire comparative methods, thereby utilizing all the external measurements for almost the entire datasets. Moreover, the new method can complement the advantages of the five previously proposed methods. Accordingly, more advanced results were obtained.

Full Text