부스팅 트리에서 적정 트리사이즈의 선택에 관한 연구

Ah-Hyoun Kim,Hyun-Joong Kim,Ji-Hyun Kim

doi:10.7465/jkdi.2012.23.5.949

Abstract

범주형 목표변수를 잘 예측하기 위한 데이터마이닝 방법 중에서 최근에는 여러 단일 분류자를 결합한 앙상블 기법이 많이 활용되고 있다. 앙상블 기법 가운데 부스팅은 재표본 시 분류하기 어려운 관찰치의 가중치를 높여 분류자가 해당 관찰치에 보다 집중할 수 있도록 함으로써 다른 앙상블 기법에 비해 오차를 효과적으로 감소시키는 방법으로 알려져 있다. 부스팅을 구성하는 분류자를 의사결정나무로 둔 부스팅 트리 모형의 경우 각 트리의 사이즈를 결정해야 하는데, 본 연구에서는 자료 별로 부스팅 트리에 가장 적합한 트리사이즈가 서로 다를수 있다고 가정하고, 주어진 자료에 맞는 트리사이즈를 추정하는 문제에 대해 논의하였다. 우선 트리사이즈가 부스팅 트리의 정확도에 중요한 영향을 미치는가를 파악하기 위하여 28개의 자료를 대상으로 실험을 수행하였으며, 그 결과 트리사이즈를 결정하는 문제가 모형 전체의 성능을 결정하는데 상당한 역할을 한다는 것을 확인할 수 있었다. 또한 그 결과를 바탕으로 최적의 트리사이즈에 영향을 미칠 것으로 판단되는 몇 가지 특성 변수를 정의하고, 해당 변수를 이용하여 부스팅 트리에서의 최적 트리사이즈를 설명하는 모형을 구성해 보았다. 자료 별로 고유한 최적의 트리사이즈는 자료의 특성에 의존적일 가능성도 있으므로 본 연구에서 제안하는 추정방법은 최적 트리사이즈를 결정하기 위한 출발점 또는 가이드라인으로 활용하는 것이 적절할 것이다. 기존에는 부스팅 트리의 사이즈에 대한 값으로 목표변수의 범주의 개수를 활용하였는데, 본 모형에서 제안하는 트리사이즈의 추정치로 부스팅 트리를 구축한 경우 기존방법에 비해 분류정확도를 유의미하게 개선하는 것을 확인할 수 있었다. This article is to find the right size of decision trees that performs better for boosting algorithm. First we defined the tree size D as the depth of a decision tree. Then we compared the performance of boosting algorithm with different tree sizes in the experiment. Although it is an usual practice to set the tree size in boosting algorithm to be small, we figured out that the choice of D has a significant influence on the performance of boosting algorithm. Furthermore, we found out that the tree size D need to be sufficiently large for some dataset. The experiment result shows that there exists an optimal D for each dataset and choosing the right size D is important in improving the performance of boosting. We also tried to find the model for estimating the right size D suitable for boosting algorithm, using variables that can explain the nature of a given dataset. The suggested model reveals that the optimal tree size D for a given dataset can be estimated by the error rate of stump tree, the number of classes, the depth of a single tree, and the gini impurity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

부스팅 트리에서 적정 트리사이즈의 선택에 관한 연구

Abstract

Talk to us

Similar Papers

More From: Journal of the Korean Data and Information Science Society

Lead the way for us

Journal: Journal of the Korean Data and Information Science Society	Publication Date: Sep 30, 2012
Citations: 4

Similar Papers

Linear decision trees, subspace arrangements and Möbius functions
Anders Björner ... László Lovász
Journal of the American Mathematical Society | VOL. 7
Anders Björner, et. al.Anders Björner ... László Lovász
01 Jan 1993
Journal of the American Mathematical Society | VOL. 7

An exponential lower bound on the size of algebraic decision trees for Max
D Grigoriev ... A.C Yao
Computational Complexity | VOL. 7
D Grigoriev, et. al.D Grigoriev ... A.C Yao
01 Dec 1998
Computational Complexity | VOL. 7

Linear decision trees
Anders Björner ... Andrew C C Yao
-
Anders Björner, et. al.Anders Björner ... Andrew C C Yao
01 Jan 1992
01 Jan 1992

Minimizing Size of Decision Trees for Multi-label Decision Tables
Mohammad Azad ... Mikhail Moshkov
-
Mohammad Azad, et. al.Mohammad Azad ... Mikhail Moshkov
29 Sep 2014
29 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

부스팅 트리에서 적정 트리사이즈의 선택에 관한 연구

Abstract

Talk to us

Similar Papers

More From: Journal of the Korean Data and Information Science Society