Abstract

Data mining is the useful tool to discovering the knowledge from large data. Different methods & algorithms are available in data mining. Classification is most common method used for finding the mine rule from the large database. Decision tree method generally used for the Classification, because it is the simple hierarchical structure for the user understanding & decision making. Various data mining algorithms available for classification based on Artificial Neural Network, Nearest Neighbour Rule & Baysen classifiers but decision tree mining is simple one. ID3 and C4.5 algorithms have been introduced by J.R Quinlan which produce reasonable decision trees. The objective of this paper is to present these algorithms. At first we present the classical algorithm that is ID3, then highlights of this study we will discuss in more detail C4.5 this one is a natural extension of the ID3 algorithm. And we will make a comparison between these two algorithms and others algorithms such as C5.0 and CART.

Highlights

  • The construction of decision trees from data is a longstanding discipline

  • Statisticians attribute the paternity to Sonquist and Morgan (1963) [4] who used regression trees in the process of prediction and explanation (AID - Automatic Interaction Detection)

  • It was followed by a whole family of method, extended to the problems of discrimination and classification, which were based on the same paradigm of representation trees (Thaid - Morgan and Messenger, 1973; CHAID - Kass, 1980)

Read more

Summary

INTRODUCTION

The construction of decision trees from data is a longstanding discipline. Statisticians attribute the paternity to Sonquist and Morgan (1963) [4] who used regression trees in the process of prediction and explanation (AID - Automatic Interaction Detection). Quinlan has been a very active player in the second half of the 80s with a large number of publications in which he proposes a heuristics to improve the behavior of the system His approach has made a significant turning point in the 90s when he presented the C4.5 method which is the other essential reference when we want to include decision trees (1993). Decision trees are a very effective method of supervised learning It aims is the partition of a dataset into groups as homogeneous as possible in terms of the variable to be predicted. It takes as input a set of classified data, and outputs a tree that resembles to an orientation diagram where each end node (leaf) is a decision (a class) and each non- final node (internal) represents a test.

INFORMATION THEORY
ID3 ALGORITHM
D13 Overcast
Pruning
Exemple 2
CONLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.