Attribute Selection Based on Constraint Gain and Depth Optimal for a Decision Tree.

Huaining Sun,Yuhong Zhang,Xuegang Hu

doi:10.3390/e21020198

Huaining Sun, Yuhong Zhang + Show 1 more

Open Access

https://doi.org/10.3390/e21020198

Copy DOI

Abstract

Uncertainty evaluation based on statistical probabilistic information entropy is a commonly used mechanism for a heuristic method construction of decision tree learning. The entropy kernel potentially links its deviation and decision tree classification performance. This paper presents a decision tree learning algorithm based on constrained gain and depth induction optimization. Firstly, the calculation and analysis of single- and multi-value event uncertainty distributions of information entropy is followed by an enhanced property of single-value event entropy kernel and multi-value event entropy peaks as well as a reciprocal relationship between peak location and the number of possible events. Secondly, this study proposed an estimated method for information entropy whose entropy kernel is replaced with a peak-shift sine function to establish a decision tree learning (CGDT) algorithm on the basis of constraint gain. Finally, by combining branch convergence and fan-out indices under an inductive depth of a decision tree, we built a constraint gained and depth inductive improved decision tree (CGDIDT) learning algorithm. Results show the benefits of the CGDT and CGDIDT algorithms.

Highlights

Decision trees are used extensively in data modelling of a system and rapid real-time prediction for real complex environments [1,2,3,4,5]
The attribute selections in constructing a decision tree are mostly based on the uncertainty heuristic method, which can be divided into the following categories: Information entropy method based on statistical probability [11,12,13,14], based on a rough set and its information entropy method [15,16,17], and the uncertainty approximate calculation method [18,19]
This study proposed an improved learning algorithm based on constraint gain and depth induction for a decision tree

Summary

Introduction

Decision trees are used extensively in data modelling of a system and rapid real-time prediction for real complex environments [1,2,3,4,5]. Given a dataset acquired by field sampling, a decision attribute is determined through a heuristic method [6,7] for training a decision tree. The attribute selections in constructing a decision tree are mostly based on the uncertainty heuristic method, which can be divided into the following categories: Information entropy method based on statistical probability [11,12,13,14], based on a rough set and its information entropy method [15,16,17], and the uncertainty approximate calculation method [18,19]. An uncertainty evaluation of Shannon information entropy [20] based on statistical probability has been used previously for uncertainty evaluation of the sample set division of decision tree training [21], such as the well-known ID3 and

Objectives

Methods

Conclusion