Abstract

Clustering is an unsupervised learning technique used in data mining for finding groups with increased object similarity within but not between them. However, the absence of a-priori knowledge on the optimal clustering criterion, and the strong bias of traditional algorithms towards clusters with a specific shape, size, or density, raise the need for more flexible solutions to find the underlying structures of the data. As a solution, clustering has been modeled as an optimization problem using meta-heuristics for generating a search space to favor groups of any desired criterion. F1-ECAC is an evolutionary clustering algorithm with an objective function designed as a supervised learning problem, which evaluates the quality of a partition in terms of its generalization degree, or its capability to train an ensemble of classifiers. Our algorithm shows a significant increase in performance and efficiency to its previous version and is highly competitive to state-of-the-art clustering algorithms. The results demonstrate F1-ECAC’s benefits in usability in a wide variety of problems due to its innovative clustering criterion.

Highlights

  • D ATA are generated at unprecedented speeds, quantities, and varieties across multiple industries and the processes within them [1]

  • F1-Evolutionary Clustering Algorithm using Supervised Classifiers (ECAC) is a clustering method that uses the benefits of supervised learning to solve unsupervised learning problems

  • Unlike common methods across the literature, we proposed an evolutionary approach to clustering by following a very different clustering criterion for evaluating partition quality based on a partition’s generalization degree instead of using a distance dissimilarity metric between clusters, avoiding cluster shape bias

Read more

Summary

Introduction

D ATA are generated at unprecedented speeds, quantities, and varieties across multiple industries and the processes within them [1]. Devices ranging from machinery in a plant to the intelligent devices in our pockets are instrumented for data collection and transmission This transition has reached our approach to personal computing and the operations, manufacturing, supply chain, marketing, and virtually every sector of productive systems. This vast amount of available data implies the inherent challenge of data science methods for extracting insights to transform data into knowledge and decisions [2]. Even though clustering generally seeks to form compact and isolated groups, the lack of a standard cluster definition has given rise to multiple methods over time [7], [8]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.