Abstract

The k-means problem is one of the most popular models of cluster analysis. The problem is NP-hard, and modern literature offers many competing heuristic approaches. Sometimes practical problems require obtaining such a result (albeit notExact), within the framework of the k-means model, which would be difficult to improve by known methods without a significant increase in the computation time or computational resources. In such cases, genetic algorithms with greedy agglomerative heuristic crossover operator might be a good choice. However, their computational complexity makes it difficult to use them for large-scale problems. The crossover operator which includes the k-means procedure, taking the absolute majority of the computation time, is essential for such algorithms, and other genetic operators such as mutation are usually eliminated or simplified. The importance of maintaining the population diversity, in particular, with the use of a mutation operator, is more significant with an increase in the data volume and available computing resources such as graphical processing units (GPUs). In this article, we propose a new greedy heuristic mutation operator for such algorithms and investigate the influence of new and well-known mutation operators on the objective function value achieved by the genetic algorithms for large-scale k-means problems. Our computational experiments demonstrate the ability of the new mutation operator, as well as the mechanism for organizing subpopulations, to improve the result of the algorithm.

Highlights

  • E crossover operator which includes the k-means procedure, taking the absolute majority of the computation time, is essential for such algorithms, and other genetic operators such as mutation are usually eliminated or simplified. e importance of maintaining the population diversity, in particular, with the use of a mutation operator, is more significant with an increase in the data volume and available computing resources such as graphical processing units (GPUs)

  • We propose a new greedy heuristic mutation operator for such algorithms and investigate the influence of new and well-known mutation operators on the objective function value achieved by the genetic algorithms for large-scale k-means problems

  • For large-scale problems, further improvement in the results of the genetic algorithms with greedy heuristic crossover can be achieved by using a special mutation operator and partially isolated solution subpopulations

Read more

Summary

Known Methods of Increasing Population Diversity in the Genetic Algorithms

Despite the widespread use of various genetic algorithms for the k-means problems in the modern literature, there is practically no systematization of the approaches used [56,57,58,59]. 􏽰 Niter←Niter + 1; NPOP← max􏽮NPOP, 􏽬 1 + Niter 􏽭 if NPOP has changed initialize the new individual SNPOP: generate randomly, |S| p; SNPOP←kMeans(S) end if ALGORITHM 7: Dynamic population size adjustment (replacement for Step 7 of Algorithm 4). Despite small populations in the genetic algorithms with the greedy agglomerative crossover, the application of a simple approach with two subpopulations allows us to improve the result of the algorithm. E idea of the Variable Neighborhood Search with randomized neighborhoods (see [32]) is based on applying the greedy heuristic procedures (Algorithms 2 and 3) to a current solution and a randomly generated one transformed into a local minimum by Algorithm 1. Our computational experiments show that, with an increase in the computational capacities and increase of the population size (which grows dynamically with the iteration number), the mutation operator plays more important role

Computational Experiments
Objective function value
Objective function
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.