Text Document Clustering Research Articles

The automatic topic extraction (TE) from scientific publications provides a very compact summary of the clusters’ contents. This often helps in locating information easily. TE enables us to define the boundaries of the scientific fields. Text Document Clustering (TDC) represents, in general, the first step of topic identification to identify the documents, which address a related subject matter. Metaheuristics are typically used as efficient approaches for TDC. The multi-verse optimizer algorithm (MVO) involves a stochastic population-based algorithm. It has been recently proposed and successfully utilized to tackle many hard optimization problems. In the TE process, the focus of each statistical TE method is placed on various language feature space aspects. The aim of this paper is to design a novel ensemble method for an automatic TE from a collection of scientific publications based on MVO as the clustering algorithm. The automatic TE, which is used in our approach, is term frequency-inverse document frequency (TF-IDF), most frequent based keyword extraction (TF), co-occurrence statistical information-based keyword extraction (CSI), TextRank (TR), and mutual information (MI). A group of candidate topics can be provided by each automatic TE method for the proposed ensemble method. Next, the ensemble approach prunes the candidate topics’ set via the application of a specific filtering heuristic. Then, their scores are recalculated based on the prescribed metrics. After that, for selecting a set of topics for certain scientific publications, dynamic threshold functions are applied. The findings emphasized the refined candidate set’s efficiency, as well as effectiveness. The results also showed that the system’s quality has been improved by new topics. The proposed method achieved better precision, as well as recall on a similar dataset compared to the state-of-the-art TE methods.

Text document clustering (TDC) represents a key task in text mining and unsupervised machine learning, which partitions a specific documents’ collection into varied K-groups according to certain similarity/dissimilarity criterion. There exists a considerable amount of knowledge in the text clustering field and many attempts were carried out to resolve the TDC problem and improve the learning performance. The multi-verse optimizer algorithm (MVO) is a stochastic population-based algorithm, which was recently introduced and successfully utilized to tackle many optimization problems that are complex. The original MVO performance is limited to the utilization of only the best solution in the exploitation phase (local search capability), which makes it suffer from entrapment in local optima and low convergence rate. This paper aims to propose a novel method of modifying the MVO algorithm called link-based Multi-verse optimizer algorithm (LBMVO) to enhance the exploitation phase in the original MVO. The enhancement involves adding a neighbor operator to the MVO algorithm to enhance the search capability via a novel probability factor, namely neighborhood selection strategy (NSS). The proposed LBMVO’s effectiveness was tested on six standard datasets, which are used in the text clustering domain in addition to five standard datasets, which are utilized in the data clustering domain. The experiments revealed that the modified MVO with NSS has boosted the results in terms of error rate, accuracy, recall, precision, F-measure, purity, entropy criteria, and high convergence rate. Generally, LBMVO has outperformed or at least showed that it is profoundly competitive compared with the original MVO algorithm and with widely known clustering techniques like Spectral, Agglomerative, Density-based spatial clustering of applications with noise (DBSCAN), K-means, K-means++ clustering techniques and the optimization algorithms like harmony search (HS), genetic algorithm (GA), particle swarm optimization (PSO), krill herd algorithm (KHA), covariance matrix adaptation evolution strategy (CMAES), coyote optimization algorithm (COA), as well as original MVO.

Text Document Clustering Research Articles

Related Topics

Articles published on Text Document Clustering

Text document clustering using mayfly optimization algorithm with k-means technique

Text Document Clustering Using Chaotic Northern Goshawk Optimization with K-means Algorithm

Improved Meta-Heuristic Model for Text Document Clustering by Adaptive Weighted Similarity

Text Document Clustering Approach by Improved Sine Cosine Algorithm

An Enhanced Expectation Maximization Text Document Clustering Algorithm for E-Content Analysis

Dynamic Sub-Swarm Approach of PSO Algorithms for Text Document Clustering.

AN IMPROVED MULTI-VERSE OPTIMIZER FOR TEXT DOCUMENTS CLUSTERING

Optimal Text Document Clustering Enabled by Weighed Similarity Oriented Jaya With Grey Wolf Optimization Algorithm

A hybrid approach for text document clustering using Jaya optimization algorithm

A novel ensemble statistical topic extraction method for scientific publications based on optimization clustering

An ensemble topic extraction approach based on optimization clusters using hybrid multi-verse optimizer for scientific publications

An Improved B-hill Climbing Optimization Technique for Solving the Text Documents Clustering Problem.

Combining Distributed Word Representation and Document Distance for Short Text Document Clustering

Link-based multi-verse optimizer for text documents clustering

Hybridization of a Social Spider Optimization Algorithm with Differential Evolution for Text Document Clustering Using Single Cluster Approach

Text-document clustering-based cause and effect analysis methodology for steel plant incident data

A Survey on Optimization Approaches to Text Document Clustering

Ontology Employment in Text Document Clustering combined with Grouping Algorithm

Evaluation of text document clustering approach based on particle swarm optimization

A Fuzzy Based Approach to Text Mining and Document Clustering

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Text Document Clustering Research Articles

Related Topics

Articles published on Text Document Clustering

Text document clustering using mayfly optimization algorithm with k-means technique

Text Document Clustering Using Chaotic Northern Goshawk Optimization with K-means Algorithm

Improved Meta-Heuristic Model for Text Document Clustering by Adaptive Weighted Similarity

Text Document Clustering Approach by Improved Sine Cosine Algorithm

An Enhanced Expectation Maximization Text Document Clustering Algorithm for E-Content Analysis

Dynamic Sub-Swarm Approach of PSO Algorithms for Text Document Clustering.

AN IMPROVED MULTI-VERSE OPTIMIZER FOR TEXT DOCUMENTS CLUSTERING

Optimal Text Document Clustering Enabled by Weighed Similarity Oriented Jaya With Grey Wolf Optimization Algorithm

A hybrid approach for text document clustering using Jaya optimization algorithm

A novel ensemble statistical topic extraction method for scientific publications based on optimization clustering

An ensemble topic extraction approach based on optimization clusters using hybrid multi-verse optimizer for scientific publications

An Improved B-hill Climbing Optimization Technique for Solving the Text Documents Clustering Problem.

Combining Distributed Word Representation and Document Distance for Short Text Document Clustering

Link-based multi-verse optimizer for text documents clustering

Hybridization of a Social Spider Optimization Algorithm with Differential Evolution for Text Document Clustering Using Single Cluster Approach

Text-document clustering-based cause and effect analysis methodology for steel plant incident data

A Survey on Optimization Approaches to Text Document Clustering

Ontology Employment in Text Document Clustering combined with Grouping Algorithm

Evaluation of text document clustering approach based on particle swarm optimization

A Fuzzy Based Approach to Text Mining and Document Clustering