Abstract

Classification and regression are defined under the umbrella of the prediction task of data mining. Discrete values are predicted using classification techniques whereas regression techniques are most suitable for predicting continuous data. Analysts from different research areas like data mining, statistics, machine learning, pattern recognition, and big data analytics preferred decision trees over other classifiers as it is simple, effective, efficient, and its performance is competitive with others. In this paper, we review extensively many popularly used state-of-the-art decision tree-based techniques for classification and regression. We present a survey of more than forty years of research that has been emphasized on the application of decision trees in both classification and regression. This survey could be the potential source for all the researchers who are keenly interested to apply the decision tree classifier/regressor for their research work.

Highlights

  • With the advancement of technologies, the process of data generation and collection is increasing at an exponential rate

  • We have presented a survey of all the classification and regression tree algorithms in a technical yet easy to understand manner

  • Where k refers to the number of tuples without missing values, Dt is the subset of the dataset containing tuples that are to be split based on a condition, DtA and DtB are the sets after partition, and β(i) is the correction factor defined as

Read more

Summary

Introduction

With the advancement of technologies, the process of data generation and collection is increasing at an exponential rate. Four different features such as tear production rate, age, spectacle prescription, astigmatic and three class labels namely hard, soft, and none are considered in this example. Based on the class membership probabilities, they had estimated classification accuracy and quality of rankings They have observed that logistic regression performed well for smaller training sets while tree induction methods for comparatively larger datasets. The review work provided the developments and key ideas supporting these algorithms He has presented a comparative analysis of the classification tree models and their partitions given by all the classification tree models using iris data from the UCI repository.

DT as a classifier
Application details of the techniques reviewed under DT for classification
Comparative analysis of various classification tree algorithms
Method THAID CHAID
DT for regression
CART for regression
GUIDE for regression
Application details of the techniques reviewed under DT for regression
Comparative analysis of regression tree algorithms
Conclusions and future work
Method AID CART

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.