유전자 알고리즘을 이용한 서울시 군집화 최적 변수 선정

Hyung Jin Kim,Sang Min Kim,Jung Bin Lee,Jae Hoon Jung,Joon Heo

doi:10.7319/kogsis.2014.22.4.175

Abstract

Abstract Korean government proposed a new initiative ‘government 3.0’ with which the administration will open its dataset to the public before requests. City of Seoul is the front runner in disclosure of government data. If we know what kind of attributes are governing factors for any given segmentation, these outcomes can be applied to real world problems of marketing and business strategy, and administrative decision makings. However, with respect to city of Seoul, selection of optimal variables from the open dataset up to several thousands of attributes would require a humongous amount of computation time because it might require a combinatorial optimization while maximizing dissimilarity measures between clusters. In this study, we acquired 718 attribute dataset from Statistics Korea and conducted an analysis to select the most suitable variables, which differentiate Gangnam from other districts, using the Genetic algorithm and Dunn’s index. Also, we utilized the Microsoft Azure cloud computing system to speed up the process time. As the result, the optimal 28 variables were finally selected, and the validation result showed that those 28 variables effectively group the Gangnam from other districts using the Ward’s minimum variance and K-means algorithm.Keywords: Clustering, Dunn’s Index, Ward’s Minimum Variance, K-means Algorithm, Genetic Algorithm

Full Text