Sample of Groups: A New Strategy to Find a Representative Point for Each Undisclosed Cluster

Wallace A Pinheiro,Ana B S Pinheiro

doi:10.5815/ijitcs.2023.05.01

Abstract

Some problems involving the selection of samples from undisclosed groups are relevant in various areas such as health, statistics, economics, and computer science. For instance, when selecting a sample from a population, well-known strategies include simple random and stratified random selection. Another related problem is selecting the initial points corresponding to samples for the K-means clustering algorithm. In this regard, many studies propose different strategies for choosing these samples. However, there is no consensus on the best or most effective approaches, even when considering specific datasets or domains. In this work, we present a new strategy called the Sample of Groups (SOG) Algorithm, which combines concepts from grid, density, and maximum distance clustering algorithms to identify representative points or samples located near the center of the cluster mass. To achieve this, we create boxes with the right size to partition the data and select the representatives of the most relevant boxes. Thus, the main goal of this work is to find quality samples or seeds of data that represent different clusters. To compare our approach with other algorithms, we not only utilize indirect measures related to K-means but also employ two direct measures that facilitate a fairer comparison among these strategies. The results indicate that our proposal outperforms the most commonly used algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sample of Groups: A New Strategy to Find a Representative Point for Each Undisclosed Cluster

Abstract

Talk to us

Similar Papers

More From: International Journal of Information Technology and Computer Science

Lead the way for us

Similar Papers

An Uncertain Tuple Density Clustering (UTDC) Algorithm for Uncertain Data
Chunhui Li ... Linjiang Zheng
Journal of Physics: Conference Series | VOL. 1544
Chunhui Li, et. al.Chunhui Li ... Linjiang Zheng
01 May 2020
Journal of Physics: Conference Series | VOL. 1544

A Domain Adaptive Density Clustering Algorithm for Data With Varying Density Distribution
Jianguo Chen ... Philip S Yu
IEEE Transactions on Knowledge and Data Engineering | VOL. 33
Jianguo Chen, et. al.Jianguo Chen ... Philip S Yu
24 Apr 2020
IEEE Transactions on Knowledge and Data Engineering | VOL. 33

Density propagation based adaptive multi-density clustering algorithm.
Yizhang Wang ... Wei Pang
PloS one | VOL. 13
Yizhang Wang, et. al.Yizhang Wang ... Wei Pang
18 Jul 2018
PloS one | VOL. 13

Vessel sailing route extraction and analysis from satellite-based AIS data using density clustering and probability algorithms
Jin Chen ... Hongdong Wang
Ocean Engineering | VOL. 280
Jin Chen, et. al.Jin Chen ... Hongdong Wang
27 May 2023
Ocean Engineering | VOL. 280

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sample of Groups: A New Strategy to Find a Representative Point for Each Undisclosed Cluster

Abstract

Talk to us

Similar Papers

More From: International Journal of Information Technology and Computer Science