Abstract

For text document clustering (TDC), a novel hybrid of the multi-verse optimizer (MVO) algorithm and k-means (also called H-MVO) are proposed in this work. Moreover, a new ensemble method for an automatic topic extraction (TE) has been proposed in this paper, from a set of scientific publications in the form of text documents with the purpose of extracting topics from clustered documents. Often, the existing TE methods draw upon the statistical theory. However, the results might be different when the same clustered document is utilized. Consequently, there can be imprecise results, which are related to the extracted topics from the clustered documents owing to the behavior of the TE methods. As a result, the vigorous characteristics of the TE methods are ensembled, thereby empowering the accuracy of the extracted topics. The results, which were yielded by H-MVO for TDC, were compared against 14 well-regarded methods, involving five clustering methods, in addition to seven metaheuristic algorithms, as well as two hybrid optimization algorithms. Also, the results, which were generated by the introduced ensembled TE method, were compared against those, which were produced by five established statistical methods in the literature. As a result, the findings revealed that the suggested ensembled TE method outperformed the entire comparative methods, thereby utilizing all the external measurements for almost the entire datasets. Moreover, the new method can complement the advantages of the five previously proposed methods. Accordingly, more advanced results were obtained.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call