Distributed Data Classification with Coalition-Based Decision Trees and Decision Template Fusion

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

In distributed data environments, classification tasks are challenged by inconsistencies across independently maintained sources. These environments are inherently characterized by high informational uncertainty. Our framework addresses this challenge through a structured process designed for the reduction of entropy in the overall decision-making process. This paper proposes a novel framework that integrates conflict analysis, coalition formation, decision tree induction, and decision template fusion to address these challenges. The method begins by identifying compatible data sources using Pawlak’s conflict model, forming coalitions that aggregate complementary information. Each coalition trains a decision tree classifier, and the final decision is derived through decision templates that fuse probabilistic outputs from all models. The proposed approach is compared with a variant that does not use coalitions, where each local source is modeled independently. Additionally, the framework extends previous work based on decision rules by introducing decision trees, which offer greater modeling flexibility while preserving interpretability. Experimental results on benchmark datasets from the UCI repository demonstrate that the proposed method consistently outperforms both the non-coalition variant and the rule-based version, particularly under moderate data dispersion. The key contributions of this work include the integration of coalition-based modeling with decision trees, the use of decision templates for interpretable fusion, and the demonstration of improved classification performance across diverse scenarios.

Similar Papers
  • Conference Article
  • Cite Count Icon 8
  • 10.1109/skima47702.2019.8982419
Big Data with Decision Tree Induction
  • Aug 1, 2019
  • Shabnam Sabah + 5 more

Big data mining is one of the major challenging research issues in the field of machine learning for data mining applications in this present digital era. Big data consists of 3V’s: (1) volume - massive amount of data/too many bytes, (2) velocity - high speed streaming data/too high a rate, and (3) variety - data are coming from different sources/too many sources. Collecting and managing real-life big data is a difficult task, as big data is so big that we cannot keep all the data together in a single machine. Therefore, we need advanced relational database management systems with parallel computing to deal with big data. Knowledge mining from big data employing traditional machine learning and data mining techniques is a big issue and attract computational intelligent researcher in this area. In this paper, we have used the decision tree (DT) induction method for mining big data. Decision tree induction is one of the most preferable and well-known supervised learning technique, which is a top-down recursive divide and conquer algorithm and require little prior knowledge for constructing a classifier. The traditional DT algorithms like Iterative Dichotomiser 3 (ID3), C4.5 (a successor of ID3 algorithm), Classification and Regression Trees (CART) are generally built for mining relatively small datasets. So, we need a more scalable decision tree learning approach for mining big data. In this paper, we have engendered several trees employing two scalable decision tree algorithms: RainForest Tree and Bootstrapped Optimistic Algorithm for Tree construction (BOAT) using seven benchmark datasets from Keel Repository and UCI Machine Learning repository. We have compared the performance of RainForest and BOAT algorithms. Also, we have proposed a decision tree merging approach, as decision tree merging is a very complex and challenging task.

  • Book Chapter
  • Cite Count Icon 8
  • 10.1007/0-387-34296-6_10
Multi-Attribute Decision Trees and Decision Rules
  • Jan 1, 2006
  • Jun-Youl Lee + 1 more

Among the numerous learning tasks that fall within the field of knowledge discovery in databases, classification may be the most common. Furthermore, top-down induction of decision trees is one of the most popular techniques for inducing such classification models. Most of the research in decision tree induction has focused on single attribute trees, but in this chapter we review multi-attribute decision trees induction and discuss how such methods can improve both the accuracy and simplicity of the decision trees. As an example of this approach we consider the recently proposed second order decision tree induction (SODI) algorithm, which uses conjunctive and disjunctive combinations of two attributes for improved decision tree induction in nominal databases. We show via numerical examples that in many cases this generates more accurate classification models and easier to interpret decision trees and rules.

  • Research Article
  • Cite Count Icon 18
  • 10.1111/coin.12049
Creating Decision Trees from Rules using RBDT‐1
  • Jul 28, 2014
  • Computational Intelligence
  • Amany Abdelhalim + 2 more

Most of the methods that generate decision trees for a specific problem use the examples of data instances in the decision tree–generation process. This article proposes a method calledRBDT‐1—rule‐based decision tree—for learning a decision tree from a set of decision rules that cover the data instances rather than from the data instances themselves. The goal is to create on demand a short and accurate decision tree from a stable or dynamically changing set of rules. The rules could be generated by an expert, by an inductive rule learning program that induces decision rules from the examples of decision instances such asAQ‐typerule induction programs, or extracted from a tree generated by another method, such as theID3orC4.5. In terms of tree complexity (number of nodes and leaves in the decision tree), RBDT‐1 compares favorably withAQDT‐1andAQDT‐2, which are methods that create decision trees from rules. RBDT‐1 also compares favorably with ID3 while it is as effective as C4.5 where both (ID3 and C4.5) are well‐known methods that generate decision trees from data examples. Experiments show that the classification accuracies of the decision trees produced by all methods under comparison are indistinguishable.

  • Book Chapter
  • 10.1007/978-90-481-9419-3_40
RBDT-1 Method: Combining Rules and Decision Tree Capabilities
  • Jan 1, 2010
  • Amany Abdelhalim + 1 more

Most of the methods that generate decision trees for a specific problem use examples of data instances in the decision tree generation process. This chapter proposes a method called “RBDT-1” - rule based decision tree - for learning a decision tree from a set of decision rules that cover the data instances rather than from the data instances themselves. RBDT-1 method uses a set of declarative rules as an input for generating a decision tree. The method’s goal is to create on-demand a short and accurate decision tree from a stable or dynamically changing set of rules. We conduct a comparative study of RBDT-1 with existing decision tree methods based on different problems. The outcome of the study shows that in terms of tree complexity (number of nodes and leaves in the decision tree) RBDT-1 compares favorably to AQDT-1, AQDT-2 which are methods that create decision trees from rules. RBDT-1 compares favorably also to ID3 which is a famous method that generates decision trees from data examples. Experiments show that the classification accuracies of the different decision trees produced by the different methods under comparison are equal.KeywordsDecision TreeLeaf NodeDecision StructureDecision ClassDecision Tree MethodThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

  • Research Article
  • Cite Count Icon 24
  • 10.1016/j.eswa.2012.04.073
Improving medical decision trees by combining relevant health-care criteria
  • Apr 30, 2012
  • Expert Systems with Applications
  • Joan Albert López-Vallverdú + 2 more

Improving medical decision trees by combining relevant health-care criteria

  • Conference Article
  • Cite Count Icon 39
  • 10.1109/icmla.2009.25
A New Method for Learning Decision Trees from Rules
  • Dec 1, 2009
  • Amany Abdelhalim + 1 more

Most of the methods that generate decision trees use examples of data instances in the decision tree generation process. This paper proposes a method called "RBDT-1"- rule based decision tree - for learning a decision tree from a set of decision rules that cover the data instances rather than from the data instances themselves. RBDT-1 method uses a set of declarative rules as an input for generating a decision tree. The method's goal is to create on-demand a short and accurate decision tree from a stable or dynamically changing set of rules. We conduct a comparative study of RBDT-1 with three existing decision tree methods based on different problems. The outcome of the study shows that RBDT-1 performs better than AQDT-1 and AQDT-2 which are methods that create decision trees from rules and than ID3 which generates decision trees from data examples, in terms of tree complexity number of nodes and leaves in the decision tree.

  • Research Article
  • Cite Count Icon 252
  • 10.1109/32.9061
Learning from examples: generation and evaluation of decision trees for software resource analysis
  • Jan 1, 1988
  • IEEE Transactions on Software Engineering
  • R.W Selby + 1 more

A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for one problem domain, specifically, that of software resource data analysis. The purpose of the decision trees is to identify classes of objects (software modules) that had high development effort, i.e. in the uppermost quartile relative to past data. Sixteen software systems ranging from 3000 to 112000 source lines have been selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4700 objects, capture a multitude of information about the objects: development effort, faults, changes, design style, and implementation style. A total of 9600 decision trees are automatically generated and evaluated. The analysis focuses on the characterization and evaluation of decision tree accuracy, complexity, and composition. The decision trees correctly identified 79.3% of the software modules that had high development effort or faults, on the average across all 9600 trees. The decision trees generated from the best parameter combinations correctly identified 88.4% of the modules on the average. Visualization of the results is emphasized, and sample decision trees are included. >

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/reconfig.2014.7032538
Memory optimisation for hardware induction of axis-parallel decision tree
  • Dec 1, 2014
  • Chuan Cheng + 1 more

In data mining and machine learning applications, the Decision Tree classifier is widely used as a supervised learning method not only in the form of a stand alone model but also as a part of an ensemble learning technique (i.e. Random Forest). The induction of Decision Trees (i.e. training stage) involves intense memory communication and inherent parallel processing, making an FPGA device a promising platform for accelerating the training process due to high memory bandwidth enabled by the embedded memory blocks in the device. However, peak memory bandwidth is reached when all the channels of the block RAMs on the FPGA are free for concurrent communication, whereas to accommodate large data sets several block RAMs are often combined together making unavailable a number of memory channels. Therefore, efficient use of the embedded memory is critical not only for allowing larger training dataset to be processed on an FPGA but also for making available as many memory channels as possible to the rest of the system. In this work, a data compression scheme is proposed for the training data stored in the embedded memory for improving the memory utilisation of the device, targeting specifically the axis-parallel decision tree classifier. The proposed scheme takes advantage of the nature of the problem of the decision tree induction and improves the memory efficiency of the system without any compromise on the performance of the classifier. It is demonstrated that the scheme can reduce the memory usage by up to 66% for the training datasets under investigation without compromise in training accuracy, while a 28% reduction in training time is achieved due to extra processing power enabled by the additional memory bandwidth.

  • Book Chapter
  • Cite Count Icon 12
  • 10.1007/978-3-642-04985-9_12
RBDT-1: A New Rule-Based Decision Tree Generation Technique
  • Jan 1, 2009
  • Amany Abdelhalim + 2 more

Most of the methods that generate decision trees use examples of data instances in the decision tree generation process. This paper proposes a method called “RBDT-1”- rule based decision tree -for learning a decision tree from a set of decision rules that cover the data instances rather than from the data instances themselves. The method’sgoal is to create on-demand a short and accurate decision tree from a stable or dynamically changing set of rules. We conduct a comparative study of RBDT-1 with three existing decision tree methods based on different problems. The outcome of the study shows that RBDT-1 performs better than AQDT-1 andAQDT-2 which are rule-based decision tree methods in terms of tree complexity (number of nodes and leaves in the decision tree). It is also shown that RBDT-1 performs equally well in terms of tree complexity compared with C4.5, which generates a decision tree from data examples.Keywordsattribute selection criteriadecision rulesdata-based decision treerule-based decision treetree complexity

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/icmlc.2005.1527372
An initial comparison on noise resisting between crisp and fuzzy decision trees
  • Jan 1, 2005
  • Juan Sun + 1 more

Decision tree induction is an effective method to solve classification problem in machine learning domain. In general, there are two types of decision tree induction, i.e., crisp decision trees and fuzzy decision trees. Both decision tree inductions based on real-world data are unlikely to find the entirely accurate training set. This means noise existing in the training set. It should be noted that the noise can either cause attributes to become inadequate, or make the decision tree more complicated. It is necessary to further investigate decision trees where the influence of noise data is considered. Experimentally, the paper analyzes the effect of three types of noises, compares the tolerance capability of noise between fuzzy decision trees and crisp decision trees, discusses the modified degree of pruning methods in both fuzzy and crisp decision trees, and addresses the adjustable capability on noise by using different fuzzy reasoning operators in the fuzzy decision tree. Finally the empirical results show fuzzy decision tree is more robust than the crisp decision tree and the post-pruning crisp decision tree.

  • Book Chapter
  • 10.1007/978-3-642-20320-6_16
Unified View of Decision Tree Learning Machines for the Purpose of Meta-learning
  • Jan 1, 2011
  • Krzysztof Grąbczewski

The experience gained from thorough analysis of many decision tree (DT) induction algorithms, has resulted in a unified model for DT construction and reliable testing. The model has been designed and implemented within Intemi - a versatile environment for data mining. Its modular architecture facilitates construction of all the most popular algorithms by combining proper building blocks. Alternative components can be reliably compared by tests in the same environment. This is the start point for a manifold research in the area of DTs, which will bring advanced meta-learning algorithms providing new knowledge about DT induction and optimal DT models for many kinds of data.KeywordsDecision treesmeta-learningobject oriented design

  • Research Article
  • Cite Count Icon 12
  • 10.1002/cem.816
Induction of decision trees using fuzzy partitions
  • Oct 1, 2003
  • Journal of Chemometrics
  • A J Myles + 1 more

A new method for the induction of fuzzy decision trees is introduced. The fuzzy decision tree classifier improves prediction accuracy using smaller models by locating more robust splitting regions. The proposed method also provides a measure of confidence for sample classification by propagating partition memberships into all leaf nodes, thereby relaxing local subspace restrictions. The fuzzy decision tree algorithm is presented and compared against standard and bagged decision tree classifiers. Copyright © 2003 John Wiley & Sons, Ltd.

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.asoc.2024.112261
A preordonance-based decision tree method and its parallel implementation in the framework of Map-Reduce
  • Dec 1, 2024
  • Applied Soft Computing
  • Hasna Chamlal + 2 more

A preordonance-based decision tree method and its parallel implementation in the framework of Map-Reduce

  • Research Article
  • 10.21460/inf.2013.91.145
GENERATOR POHON KEPUTUSAN DENGAN MENERAPKAN ALGORITMA C4.5 UNTUK PROGRAM KONSULTASI
  • Jul 31, 2013
  • Irma Kharis + 2 more

C4.5 algorithm is used to simplify decision tree from decision table by generating decision tree from an existing decision tree. With this algorithm, knowledge base in the decision table can be simplified. This research will build a consultation program using C4.5 algorithm which is called decision tree generator. Decision tree generator provides inference facility and a user interface for consultation. The user is required to build a knowledge base first, and the application will generate user interface automatically. There are two steps in decision tree generator: firstly, the application will build the decision tree, and after that the application will build the user interface for consultation session. The results of this research show that the decision tree generator can get goal and advice from tree exploration in consultation session.

  • Research Article
  • Cite Count Icon 7
  • 10.1177/0272989x0102100503
Identifying diagnostic errors with induced decision trees.
  • Oct 1, 2001
  • Medical Decision Making
  • Catherine K Murphy

The purpose of this article is to compare the diagnostic accuracy of induced decision trees with that of pruned neural networks and to improve the accuracy and interpretation of breast cancer diagnosis from readings of thin-needle aspirate by identifying cases likely to be misclassified by induced decision rules. Using an online database consisting of 699 cases of suspected breast cancer and their corresponding readings of fine-needle aspirate, decision trees were induced from half of the cases, randomly selected. Accuracy was determined for the remaining cases in successive partitions. The pattern of errors in the multiple decision trees was examined. A smaller data set was created with 2 classes: (1) correctly classified and (2) misclassified by a decision tree, rather than the original benign and malignant classes. From this data set, decision trees that describe the misclassified cases were induced. Larger, less severely pruned decision trees were more accurate in breast cancer diagnosis for both training and test data. The accuracy of the induced decision trees exceeded that reported for the smaller pruned neural networks. Combining classifications from 2 trees was effective in identifying malignancies missed by a single tree. Induced decision trees were able to identify patterns associated with misclassified cases, but the identification of errors inductively did not improve the overall error rate. In this application, a model that is too compact identifies fewer cases of the minority class, malignancy. New methods that combine models and examine classification errors can improve diagnosis by identifying more malignancies and by describing ambiguous cases.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.