Decision Tree Induction Method Research Articles

The invasion of freshwater ecosystems is a particularly alarming phenomenon in the Iberian Peninsula. Habitat suitability modelling is a proficient approach to extract knowledge about species ecology and to guide adequate management actions. Decision-trees are an interpretable modelling technique widely used in ecology, able to handle strongly nonlinear relationships with high order interactions and diverse variable types. Decision-trees recursively split the input space into two parts maximising child node homogeneity. This recursive partitioning is typically performed with axis-parallel splits in a top-down fashion. However, recent developments of the R packages oblique.tree, which allows the development of oblique split-based decision-trees, and evtree, which performs globally optimal searches with evolutionary algorithms to do so, seem to outperform the standard axis-parallel top-down algorithms; CART and C5.0. To evaluate their possible use in ecology, the two new partitioning algorithms were compared with the two well-known, standard axis-parallel algorithms. The entire process was performed in R by simultaneously tuning the decision-tree parameters and the variables subset with a genetic algorithm and modelling the presence–absence of the Iberian gudgeon (Gobio lozanoi; Doadrio and Madeira, 2004), an invasive fish species that has spread across the Iberian Peninsula. The accuracy and complexity of the trees, the modelled patterns of mesohabitat selection and the variables importance were compared. None of the new R packages, namely oblique.tree and evtree, outperformed the C5.0 algorithm. They rendered almost the same decision-trees as the CART algorithm, although they were completely interpretable – they performed from four to eight partitions – in comparison with C5.0, which resulted in a more complex structure with 17 partitions. Oblique.tree proved to be affected by prevalence and it does not include the possibility of weighting the observations, which potentially discourage its actual use. Although the use of evtree did not suggest a major improvement compared with the remaining packages, it allowed the development of regression trees which may be informative for additional modelling tasks such as abundance estimation. Looking at the resulting decision-trees, the optimal habitats for the Iberian gudgeon were large pools in lowland river segments with depositional areas and aquatic vegetation present, which typically appeared in the form of scattered macrophytes clumps. Furthermore, Iberian gudgeon seems to avoid habitats characterised by scouring phenomena and limited vegetated cover availability. Accordingly, we can assume that river regulation and artificial impoundment would have favoured the spread of the Iberian gudgeon across the entire peninsula.

Read full abstract

Introduction: The present work compared the prediction power of the different data mining techniques used to develop the HIV testing prediction model. Four popular data mining algorithms (Decision tree, Naive Bayes, Neural network, logistic regression) were used to build the model that predicts whether an individual was being tested for HIV among adults in Ethiopia using EDHS 2011. The final experimentation results indicated that the decision tree (random tree algorithm) performed the best with accuracy of 96%, the decision tree induction method (J48) came out to be the second best with a classification accuracy of 79%, followed by neural network (78%). Logistic regression has also achieved the least classification accuracy of 74%. Objectives: The objective of this study is to compare the prediction power of the different data mining techniques used to develop the HIV testing prediction model. Methods: Cross-Industry Standard Process for Data Mining (CRISP-DM) was used to predict the model for HIV testing and explore association rules between HIV testing and the selected attributes. Data preprocessing was performed and missing values for the categorical variable were replaced by the modal value of the variable. Different data mining techniques were used to build the predictive model. Results: The target dataset contained 30,625 study participants. Out of which 16,515 (54%) participants were women while the rest 14,110 (46%) were men. The age of the participants in the dataset ranged from 15 to 59 years old with modal age of 15 - 19 years old. Among the study participants, 17,719 (58%) have never been tested for HIV while the rest 12,906 (42%) had been tested. Residence, educational level, wealth index, HIV related stigma, knowledge related to HIV, region, age group, risky sexual behaviour attributes, knowledge about where to test for HIV and knowledge on family planning through mass media were found to be predictors for HIV testing. Conclusion and Recommendation: The results obtained from this research reveal that data mining is crucial in extracting relevant information for the effective utilization of HIV testing services which has clinical, community and public health importance at all levels. It is vital to apply different data mining techniques for the same settings and compare the model performances (based on accuracy, sensitivity, and specificity) with each other. Furthermore, this study would also invite interested researchers to explore more on the application of data mining techniques in healthcare industry or else in related and similar settings for the future.

Read full abstract

Decision Tree Induction Method Research Articles

Related Topics

Articles published on Decision Tree Induction Method

An Experimental Comparison of Self-Adaptive Differential Evolution Algorithms to Induce Oblique Decision Trees

Bsnsing: A Decision Tree Induction Method Based on Recursive Optimal Boolean Rule Composition

A branch & bound algorithm to determine optimal bivariate splits for oblique decision tree induction

Soil carbon stock in archaeological black earth under different land use systems in the Brazilian Amazon

Rule-Based Classification using Multi Soft Set Theory

Application of the decision tree method in forensic-medical practice in the analysis of 'doctors proceedings'

Similarity-based decision tree induction method and its application to cancer recognition on tomographic images

A Hybrid Data Mining Approach for Generalizing Characteristics of Emergency Department Visits Causing Overcrowding

Construction of Near-Optimal Axis-Parallel Decision Trees Using a Differential-Evolution-Based Approach

Large-scale data analysis on aviation accident database using different data mining techniques

Manifold Learning Co-Location Decision Tree for Remotely Sensed Imagery Classification

Comparing four methods for decision-tree induction: A case study on the invasive Iberian gudgeon (Gobio lozanoi; Doadrio and Madeira, 2004)

VR-BFDT: A variance reduction based binary fuzzy decision tree induction method for protein function prediction

Comparing Data Mining Techniques in HIV Testing Prediction

Interval-valued fuzzy decision trees with optimal neighbourhood perimeter

A prospective field study for sensor-based identification of fall risk in older people with dementia

Fuzzy Decision Tree

A Method for Supporting the Domain Expert by the Interpretation of Different Decision Trees Learnt from the Same Domain

A Multi-Relational Decision Tree Learning (MRDTL) Approach: A Survey

Hierarchical multi-label classification using local neural networks

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Decision Tree Induction Method Research Articles

Related Topics

Articles published on Decision Tree Induction Method

An Experimental Comparison of Self-Adaptive Differential Evolution Algorithms to Induce Oblique Decision Trees

Bsnsing: A Decision Tree Induction Method Based on Recursive Optimal Boolean Rule Composition

A branch & bound algorithm to determine optimal bivariate splits for oblique decision tree induction

Soil carbon stock in archaeological black earth under different land use systems in the Brazilian Amazon

Rule-Based Classification using Multi Soft Set Theory

Application of the decision tree method in forensic-medical practice in the analysis of 'doctors proceedings'

Similarity-based decision tree induction method and its application to cancer recognition on tomographic images

A Hybrid Data Mining Approach for Generalizing Characteristics of Emergency Department Visits Causing Overcrowding

Construction of Near-Optimal Axis-Parallel Decision Trees Using a Differential-Evolution-Based Approach

Large-scale data analysis on aviation accident database using different data mining techniques

Manifold Learning Co-Location Decision Tree for Remotely Sensed Imagery Classification

Comparing four methods for decision-tree induction: A case study on the invasive Iberian gudgeon (Gobio lozanoi; Doadrio and Madeira, 2004)

VR-BFDT: A variance reduction based binary fuzzy decision tree induction method for protein function prediction

Comparing Data Mining Techniques in HIV Testing Prediction

Interval-valued fuzzy decision trees with optimal neighbourhood perimeter

A prospective field study for sensor-based identification of fall risk in older people with dementia

Fuzzy Decision Tree

A Method for Supporting the Domain Expert by the Interpretation of Different Decision Trees Learnt from the Same Domain

A Multi-Relational Decision Tree Learning (MRDTL) Approach: A Survey

Hierarchical multi-label classification using local neural networks