Abstract

The sensitivity of the elbow rule in determining an optimal number of clusters in high-dimensional spaces that are characterized by tightly distributed data points is demonstrated. The high-dimensional data samples are not artificially generated, but they are taken from a real world evolutionary many-objective optimization. They comprise of Pareto fronts from the last 10 generations of an evolutionary optimization computation with 14 objective functions. The choice for analyzing Pareto fronts is strategic, as it is squarely intended to benefit the user who only needs one solution to implement from the Pareto set, and therefore a systematic means of reducing the cardinality of solutions is imperative. As such, clustering the data and identifying the cluster from which to pick the desired solution is covered in this manuscript, highlighting the implementation of the elbow rule and the use of hyper-radial distances for cluster identity. The Calinski-Harabasz statistic was favored for determining the criteria used in the elbow rule because of its robustness. The statistic takes into account the variance within clusters and also the variance between the clusters. This exercise also opened an opportunity to revisit the justification of using the highest Calinski-Harabasz criterion for determining the optimal number of clusters for multivariate data. The elbow rule predicted the maximum end of the optimal number of clusters, and the highest Calinski-Harabasz criterion method favored the number of clusters at the lower end. Both results are used in a unique way for understanding high-dimensional data, despite being inconclusive regarding which of the two methods determine the true optimal number of clusters.

Highlights

  • This manuscript targets the problem of reducing the cardinality of high-dimensional solutions that are encountered in heuristic many-objective optimization problems

  • The method was as follows and note that the visualization is only included for the purposes of informing the reader: (a) generate normalized data for 10 high-dimensional datasets; (b) visualize the high-dimensional spaces using the Sammon’s nonlinear mapping or classical MDS; (c) determine the Calinski-Harabasz criteria for the high-dimensional spaces; (d) determine the optimal number of clusters using the elbow rule; (e) determine the clusters using K-means++ for the elbow rule; (f) define the clusters on the basis of summary statistics of hyper-radial distances; and, (g) visualize the high-dimensional spaces highlighting the clusters determined from the highest

  • The initial step was to min–max normalize the objectives for each solution across the 10 datasets and rearrange them into three super-objective matrices namely, profitability, productivity and environmental impact, to make it possible to visualize the data in 3D: (a) maxProfit: income, costs and Earnings Before Interest and Taxes (EBIT); (b) maxProd: beef, wool, sheepmeat, milksolids, sawlog and pulpwood; and, (c) minEnv: nitrate leaching, phosphorus loss, sedimentation, water quantity, and CO2e

Read more

Summary

Introduction

This manuscript targets the problem of reducing the cardinality of high-dimensional solutions that are encountered in heuristic many-objective optimization problems. The holy grail in the heuristic search optimization domain is to essentially minimize the subjective areas (mainly tied to visualization), of reducing the cardinality of solutions and the deployment of the time-consuming Multi-Criteria Decision Making (MCDM) to determine the unique solution from the shortlist. The techniques that are employed will apply to any exploratory data analytics and, as such, the introduction is written in a generic sense and the results and discussion are certainly invaluable to general data science. In other words, this is a case where data science meets the heuristic search optimization

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call