A scalable decision tree system and its application in pattern recognition and intrusion detection

Xiaobai Li

doi:10.1016/j.dss.2004.06.0l6

Abstract

One of the most challenging problems in data mining is to develop scalable algorithms capable of mining massive data sets whose sizes exceed the capacity of a computer's memory. In this paper, we propose a new decision tree algorithm, named SURPASS (for Scaling Up Recursive Partitioning with Sufficient Statistics), that is highly effective in handling such large data. SURPASS incorporates linear discriminants into decision trees' recursive partitioning process. In SURPASS, the information required to build a decision tree is summarized into a set of sufficient statistics, which can be gathered incrementally from the data, by reading a subset of the data from storage space to main memory one at a time. As a result, the data size that can be handled by this algorithm is independent of memory size. We apply SURPASS to three large data sets pertaining to pattern recognition and intrusion detection problems. The results indicate that SURPASS scales up well against large data sets and produces decision tree models with very high quality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A scalable decision tree system and its application in pattern recognition and intrusion detection

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A scalable decision tree system and its application in pattern recognition and intrusion detection
Xiao-Bai Li
Decision Support Systems | VOL. 41
Xiao-Bai LiXiao-Bai Li
26 Aug 2004
Decision Support Systems | VOL. 41

A scalable decision tree system and its application in pattern recognition and intrusion detection

-

01 Nov 2005
01 Nov 2005

The establishment of a decision tree model for the individualized treatment of spinal metastases based on RPA
...
Chinese Journal of Orthopaedics | VOL. 38
, et. al. ...
16 Jul 2018
Chinese Journal of Orthopaedics | VOL. 38

Comparison of two data mining techniques in labeling diagnosis to Iranian pharmacy claim dataset: artificial neural network (ANN) versus decision tree model.
Mahmoud Mahmoudi ... Alireza Mesdaghinia
Archives of Iranian medicine | VOL. 17
Mahmoud Mahmoudi, et. al.Mahmoud Mahmoudi ... Alireza Mesdaghinia
01 Dec 2014
Archives of Iranian medicine | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A scalable decision tree system and its application in pattern recognition and intrusion detection

Abstract

Talk to us

Similar Papers