AN APPLICATION OF GENETIC ALGORITHM FOR CLUSTERING OBSERVATIONS WITH INCOMPLETE DATA

Frisca Rizki Ananda,Asep Saefuddin,Bagus Sartono

doi:10.29244/ijsa.v1i1.48

Frisca Rizki Ananda, Asep Saefuddin + Show 1 more

Open Access

https://doi.org/10.29244/ijsa.v1i1.48

Copy DOI

Abstract

Cluster analysis is a method to classify observations into several clusters. A common strategy for clustering the observations uses distance as a similarity index. However distance approach cannot be applied when data is not complete. Genetic Algorithm is applied by involving variance (GACV) in order to solve this problem. This study employed GACV on Iris data that was introduced by Sir Ronald Fisher. Clustering the incomplete data was implemented on data which was produced by deleting some values of Iris data. The algorithm was developed under R 3.0.2 software and got satisfying result for clustering complete data with 95.99% sensitivity and 98% consistency. GACV could be applied to cluster observations with missing value without filling in the missing value or excluding these observations. Performance on clustering incomplete observations is also satisfying but tends to decrease as the proportion of incomplete values increases. The proportion of incomplete values should be less than or equal to 40% to get sensitivity and consistency not less than 90. Keywords: Cluster Analysis, Genetic Algorithm, Incomplete Data.

Highlights

Cluster analysis is an important technique in a wide variety of fields, such as psychology, economics, biology, bioinformatics, medicine, business and marketing, social science, world wide web, and data mining
Zadeh et al (2011) applied cluster analysis for profiling customers of a bank based on their behavior
The cluster result of GACV could be compared with the correct cluster to assess its performance since the species of this data is given

Summary

Introduction

Cluster analysis is an important technique in a wide variety of fields, such as psychology, economics, biology, bioinformatics, medicine, business and marketing, social science, world wide web, and data mining. Zadeh et al (2011) applied cluster analysis for profiling customers of a bank based on their behavior. Most of clustering methods employ distance as a similarity index for clustering the observation. This index requires complete information for all observations. Sometimes we are faced with the observations that have incomplete values for some variables This will disrupt the process of calculating the distance to each observation, so that we should be filling in the missing values or excluding those observations. Filling in the missing values will result an addition error in the analysis due to estimate the missing value, whereas excluding observations will reduce the information, other than that sometimes we want to know the group from an observation these observations have incomplete value, so this technique cannot be applied. Employing a different similarity index with other approaches can overcome this problem

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AN APPLICATION OF GENETIC ALGORITHM FOR CLUSTERING OBSERVATIONS WITH INCOMPLETE DATA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Indonesian Journal of Statistics and Its Applications

Lead the way for us

Journal: Indonesian Journal of Statistics and Its Applications	Publication Date: Oct 31, 2017
License type: cc-by

Similar Papers

What is missing from my missing data plan?
Sharon D Yeatts ... Renée H Martin
Stroke | VOL. 46
Sharon D Yeatts, et. al.Sharon D Yeatts ... Renée H Martin
07 May 2015
Stroke | VOL. 46

Accuracy of Patient Perception of Supine Sleep.
Peter D Wallbridge ... Christopher J Worsnop
Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine | VOL. 14
Peter D Wallbridge, et. al.Peter D Wallbridge ... Christopher J Worsnop
15 Jul 2018
Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine | VOL. 14

Missing Data: Its Emergence in the Real-world-A Practical Review on Google Play Apps dataset using Python
Vikalp Kumar Tripathi ... Mallikharjuna Rao K
-
Vikalp Kumar Tripathi, et. al.Vikalp Kumar Tripathi ... Mallikharjuna Rao K
20 May 2022
20 May 2022

DATA ENVELOPMENT ANALYSIS WITH MISSING DATA: AN EXPECTATION MAXIMIZATION APPROACH
Talat Senel ... Yuksel Terzi
PONTE International Scientific Researchs Journal | VOL. 72
Talat Senel, et. al.Talat Senel ... Yuksel Terzi
01 Jan 2015
PONTE International Scientific Researchs Journal | VOL. 72

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AN APPLICATION OF GENETIC ALGORITHM FOR CLUSTERING OBSERVATIONS WITH INCOMPLETE DATA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Indonesian Journal of Statistics and Its Applications