K-nn을 이용한 Hot Deck 기반의 결측치 대체

Soonchang Kwon

doi:10.9716/kits.2014.13.4.359

Abstract

Abstract Submitted：April 25, 2014 1 st Revision：December 10, 2014 Accepted：December 14, 2014*본 연구는 인천대학교 교내 연구비지원에 의하여 연구되었음.** 인천대학교 무역학부 교수Researchers cannot avoid missing data in collecting data, becau se some respondents arbitrarily or non-arbitrarily do not answer questions in studies and experiments. Missing dat a not only increase and distort standard deviations, but also impair the convenience of estimating parameters and th e reliability of research results.Despite widespread use of hot deck, researchers have not been i nterested in it, since it handles missing data in ambiguous ways. Hot deck can be complemented using K-nn, a meth od of machine learning, which can organize donor groups closest to properties of missing data. Interested in the role of k-nn, this study was conducted to impute missing data based on the hot deck method using k-nn. After setting up imputation of missing data based on hot deck u sing k-nn as a study objective, deletion of listwise, mean, mode, linear regression, and svm imputation were compared and verified regarding nominal and ratio data types and then, data closest to original values were obtained r easonably. Simulations using different neighboring numbers and the distance measuring method were carried out and better performance of k-nn was accomplished. In this study, imputation of hot deck was re-discovered which h as failed to attract the attention of researchers. As a result, this study shall be able to help select non-parame tric methods which are less likely to be affected by the structure of missing data and its causes.

Full Text