Size Of Decision Tree Research Articles

범주형 목표변수를 잘 예측하기 위한 데이터마이닝 방법 중에서 최근에는 여러 단일 분류자를 결합한 앙상블 기법이 많이 활용되고 있다. 앙상블 기법 가운데 부스팅은 재표본 시 분류하기 어려운 관찰치의 가중치를 높여 분류자가 해당 관찰치에 보다 집중할 수 있도록 함으로써 다른 앙상블 기법에 비해 오차를 효과적으로 감소시키는 방법으로 알려져 있다. 부스팅을 구성하는 분류자를 의사결정나무로 둔 부스팅 트리 모형의 경우 각 트리의 사이즈를 결정해야 하는데, 본 연구에서는 자료 별로 부스팅 트리에 가장 적합한 트리사이즈가 서로 다를수 있다고 가정하고, 주어진 자료에 맞는 트리사이즈를 추정하는 문제에 대해 논의하였다. 우선 트리사이즈가 부스팅 트리의 정확도에 중요한 영향을 미치는가를 파악하기 위하여 28개의 자료를 대상으로 실험을 수행하였으며, 그 결과 트리사이즈를 결정하는 문제가 모형 전체의 성능을 결정하는데 상당한 역할을 한다는 것을 확인할 수 있었다. 또한 그 결과를 바탕으로 최적의 트리사이즈에 영향을 미칠 것으로 판단되는 몇 가지 특성 변수를 정의하고, 해당 변수를 이용하여 부스팅 트리에서의 최적 트리사이즈를 설명하는 모형을 구성해 보았다. 자료 별로 고유한 최적의 트리사이즈는 자료의 특성에 의존적일 가능성도 있으므로 본 연구에서 제안하는 추정방법은 최적 트리사이즈를 결정하기 위한 출발점 또는 가이드라인으로 활용하는 것이 적절할 것이다. 기존에는 부스팅 트리의 사이즈에 대한 값으로 목표변수의 범주의 개수를 활용하였는데, 본 모형에서 제안하는 트리사이즈의 추정치로 부스팅 트리를 구축한 경우 기존방법에 비해 분류정확도를 유의미하게 개선하는 것을 확인할 수 있었다. This article is to find the right size of decision trees that performs better for boosting algorithm. First we defined the tree size D as the depth of a decision tree. Then we compared the performance of boosting algorithm with different tree sizes in the experiment. Although it is an usual practice to set the tree size in boosting algorithm to be small, we figured out that the choice of D has a significant influence on the performance of boosting algorithm. Furthermore, we found out that the tree size D need to be sufficiently large for some dataset. The experiment result shows that there exists an optimal D for each dataset and choosing the right size D is important in improving the performance of boosting. We also tried to find the model for estimating the right size D suitable for boosting algorithm, using variables that can explain the nature of a given dataset. The suggested model reveals that the optimal tree size D for a given dataset can be estimated by the error rate of stump tree, the number of classes, the depth of a single tree, and the gini impurity.

Read full abstract

提出一种潜在属性空间树分类器(latent attribute space tree classifier,简称LAST)框架,通过将原属性空间变换到更容易分离数据或更符合决策树分类特点的潜在属性空间,突破传统决策树算法的决策面局限,改善树分类器的泛化性能.在LAST 框架下,提出了两种奇异值分解斜决策树(SVD (singular value decomposition) oblique decision tree,简称SODT)算法,通过对全局或局部数据进行奇异值分解,构建正交的潜在属性空间,然后在潜在属性空间内构建传统的单变量决策树或树节点,从而间接获得原空间内近似最优的斜决策树.SODT 算法既能够处理整体数据与局部数据分布相同或不同的数据集,又可以充分利用有标签和无标签数据的结构信息,分类结果不受样本随机重排的影响,而且时间复杂度还与单变量决策树算法相同.在复杂数据集上的实验结果表明,与传统的单变量决策树算法和其他斜决策树算法相比,SODT 算法的分类准确率更高,构建的决策树大小更稳定,整体分类性能更鲁棒,决策树构建时间与C4.5 算法相近,而远小于其他斜决策树算法.;A framework of latent attribute space tree classifier (LAST) is proposed in this paper. LAST transforms data from the original attribute space into the latent attribute space, which is easier for data separation or more suitable for tree classifier, so that the decision boundary of the traditional decision tree can be extended and its generalization ability can be improved. This paper presents two SVD (singular value decomposition) oblique decision tree (SODT) algorithms based on the LAST framework. SODT first performs SVD on global and/or local data to construct orthogonal latent attribute space. Then, traditional decision tree or tree nodes are built in that space.Finally, SODT obtains the approximately optimal oblique decision tree of the original space. SODT can not only handle datasets with similar or different distribution between global and local data, but also can make full use of the structure information of the labelled and unlabelled data and produce the same classification results no matter how the observations are arranged. Besides, the time complexity of SODT is identical to that of the univariate decision tree. Experimental results show that compared with the traditional univariate decision tree algorithm C4.5 and the oblique decision tree algorithms OC1 and CART-LC, SODT gives higher classification accuracy, more stable decision tree size and comparable tree-construction time as C4.5, which is much less than that of OC1 and CART-LC.

Read full abstract

Size Of Decision Tree Research Articles

Related Topics

Articles published on Size Of Decision Tree

XClusters: Explainability-First Clustering

Z-number-valued rule-based decision trees

Topological Forest

PHeavy: Predicting Heavy Flows in the Programmable Data Plane

Risk Factors for High-Risk Adenoma on the First Lifetime Colonoscopy Using Decision Tree Method: A Cross-Sectional Study in 6,047 Asymptomatic Koreans.

Improved algorithm of decision tree based on neural network

Wildcard Fields-Based Partitioning for Fast and Scalable Packet Classification in Vehicle-to-Everything.

Sparse alternating decision tree

Decisions: Algebra, Implementation, and First Experiments

Decision Tree Generation Algorithm without Pruning

부스팅 트리에서 적정 트리사이즈의 선택에 관한 연구

Improved direct product theorems for randomized query complexity

New Decision Tree Algorithm with Restrained Factor Involved

Consistency of randomized and finite sized decision tree ensembles

Latent Attribute Space Tree Classifiers

Learning Monotone Decision Trees in Polynomial Time

Finding the right decision tree's induction strategy for a hard real world problem

FINDING SMALL EQUIVALENT DECISION TREES IS HARD

Deterministic and Nondeterministic Decision Trees for Rough Computing

An exponential lower bound on the size of algebraic decision trees for Max

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Size Of Decision Tree Research Articles

Related Topics

Articles published on Size Of Decision Tree

XClusters: Explainability-First Clustering

Z-number-valued rule-based decision trees

Topological Forest

PHeavy: Predicting Heavy Flows in the Programmable Data Plane

Risk Factors for High-Risk Adenoma on the First Lifetime Colonoscopy Using Decision Tree Method: A Cross-Sectional Study in 6,047 Asymptomatic Koreans.

Improved algorithm of decision tree based on neural network

Wildcard Fields-Based Partitioning for Fast and Scalable Packet Classification in Vehicle-to-Everything.

Sparse alternating decision tree

Decisions: Algebra, Implementation, and First Experiments

Decision Tree Generation Algorithm without Pruning

부스팅 트리에서 적정 트리사이즈의 선택에 관한 연구

Improved direct product theorems for randomized query complexity

New Decision Tree Algorithm with Restrained Factor Involved

Consistency of randomized and finite sized decision tree ensembles

Latent Attribute Space Tree Classifiers

Learning Monotone Decision Trees in Polynomial Time

Finding the right decision tree's induction strategy for a hard real world problem

FINDING SMALL EQUIVALENT DECISION TREES IS HARD

Deterministic and Nondeterministic Decision Trees for Rough Computing

An exponential lower bound on the size of algebraic decision trees for Max