Implementation Algorithm C4.5 To Find Recommendation From Gym Exercise Data

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Sport has become an important aspect of human health. One of the most common issues that discovered when participating in sports is a lack of knowledge about the sport itself. The purpose of this research is to generate recommendations from implementing the C4.5 algorithm and the Decision Tree. The research will also carry out several methods such as pre-processing consisting of Data Cleaning, Feature Selection, and Data Transformation to deliver the best data results. Data that used in this research is movement data at the gym. The performance of the C4.5 algorithm is determined by performing Validation and Testing, for this case which is include Accuracy, Precision, and Recall. This research will produce recommendations from the implementation of C4.5, the previously mentioned Decision Tree results will be examined so that a recommendation can be made.

Similar Papers
  • Conference Article
  • 10.3990/2.378
Value of feature reduction for crop differentiation using multi-temporal imagery, machine learning, and object-based image analysis
  • Jan 1, 2016
  • J.K Gilbertson + 1 more

This study examined the value of automated and manual feature selection, when applied to machine learning and object-based image analysis (OBIA), for the differentiation of crops in a Mediterranean climate. Five Landsat8 images covering the phenological stages of seven major crops types in the study area (Cape Winelands, South Africa) were acquired and processed. A statistical image fusion technique was used to enhance the spatial resolution of the imagery. The pan-sharpened imagery was used to produce a range of spectral features, textural measures, indices and colour transformations, after which it was segmented using the multi-resolution (MRS) algorithm. The entire set of 205 features (41 per image capture date) was then subjected to different feature selection and reduction methods. The feature selection and reduction methods included manual feature removal (i.e. grouping into semantic themes), filter methods (such as classification and regression trees (CART) and random forest (RF)), and statistical principal components analysis (PCA). The experiments were carried out in two scenarios, namely 1) on all input images in combination; and 2) on each individual image date. The feature subsets were used as input to decision trees (DTs), k-nearest neighbour (k-NN), support vector machine (SVM), and random forest (RF) machine learning classifiers. In order to assess the value of each feature reduction method (comprising feature reduction and selection techniques), overall accuracy, kappa coefficient and McNemar’s test were employed to assess classification accuracy and compare the results. The results show that feature selection was able to improve the overall crop identification accuracy for the DT, k-NN, and RF classifiers, but was unable to do so for SVM. SVM scored the highest overall accuracy and kappa coefficient, even without applying feature reduction or selection. Based on these results it was concluded that, although feature selection can aid the crop differentiation process, it is not a necessity.

  • Research Article
  • 10.35145/jabt.v5i3.155
Comparison of Feature Selection with Information Gain Method in Decision Tree, Regression Logistic and Random Forest Algorithms
  • Sep 30, 2024
  • Journal of Applied Business and Technology
  • Muhammad Sholeh + 2 more

One of the approaches that can be done is to perform feature selection. Feature selection is done by identifying the most informative features and not using features that do not directly contribute to the target feature. The purpose of feature selection is to increase the accuracy of the model. The research was conducted by comparing the performance of the model by comparing the accuracy results of the model without any feature selection with the model that has done feature selection. The process is done by comparing the accuracy results with decision tree, random forest and SVM algorithms. In the research method of feature selection on science data, the steps include understanding the domain and dataset, exploratory analysis, data cleaning, measuring feature relevance with criteria such as Information Gain, and feature ranking. The results are evaluated and validated using model performance metrics before and after feature selection. This process ensures selection of relevant features, improving accuracy. The research process used the Lung Cancer Prediction datasheet which consists of 306 rows and 16 attributes. The results show that feature selection can improve the performance of the classification model by reducing features that do not contribute to the target. Comparison results using decision tree, Regression Logistic and random forest classification model algorithms and feature selection resulted in a high accuracy value of 0.968 in the Regression Logistic algorithm with a feature selection of 5.

  • Research Article
  • Cite Count Icon 3
  • 10.1088/2631-8695/ad3380
Optimizing bearing health condition monitoring: exploring correlation feature selection algorithm
  • Apr 18, 2024
  • Engineering Research Express
  • Anju Sharma + 2 more

Vibration signals are a critical source of information for detecting and diagnosing bearing faults, making this research particularly relevant to the condition monitoring of industrial machinery, particularly bearings using vibration signals. This study delves into how feature selection can be done using Pearson’s Correlation Co-efficient within the context of monitoring bearing health conditions, utilizing two distinct approaches. Approach-1 involves feature selection without considering labels, while Approach-2 incorporates labels for feature selection. Comparative analysis is conducted against outcomes obtained when all features are selected. The research scrutinizes the impact of feature selection on classifier performance, accuracy, and execution times, utilizing various machine learning algorithms such as Decision Tree (DT), K Nearest Neighbor (KNN), Support Vector Machine (SVM), and Naïve Bayes (NB). The findings underscore that feature selection significantly enhances classifier accuracy while reducing execution times. Specifically, only DT and KNN with 50 neighbors achieved 100% accuracy when all features were considered. However, with feature selection using Approach-1 (without labels), DT, KNN, SVM (excluding 100 neighbors), and NB (with Normal/Gaussian kernel) attained 100% accuracy. Employing Approach-2 (with labeled features), DT with 0.7 and 0.9 thresholds, SVM-G with all thresholds (0.6, 0.7, and 0.9), KNN with all thresholds (except 100 neighbors), and NB-n (with all thresholds) achieved 100% accuracy. The study emphasizes the pivotal role of feature selection using Pearson’s Correlation Coefficient in enhancing machine learning classifier performance, offering promising avenues for future research and practical applications across diverse domains.

  • Conference Article
  • Cite Count Icon 2
  • 10.1145/3388440.3412426
Smart Computational Approaches with Advanced Feature Selection Algorithms for Optimizing the Classification of Mobility Data in Health Informatics
  • Sep 21, 2020
  • Elham Rastegari + 2 more

Recently, wearable mobility monitoring devices have gained a great deal of attention for collecting movement and gait-related data. Moreover, Wearable movement monitoring devices together with machine learning techniques have been shown to be successful in a variety of healthcare applications, including diagnosis, prognosis, and rehabilitation. However, advanced studies are needed to create accurate and robust models that can differentiate between different populations based on their mobility signatures. This is particularly critical for monitoring movement and gait patterns of individuals impacted by neurodegenerative conditions such as Parkinson's Disease (PD). In order to achieve this goal, it is critical to employ a robust approach to model available data and identify the optimal set of movement parameters for the classification process. In this work, we propose a computational approach to identify the best feature selection method for spatiotemporal gait parameters. We investigate several feature selection approaches and analyze their performance as related to the mobility classification problem; including maximum information gain with minimum correlation (MIGMC), maximum signal to noise ratio with minimum correlation (MSNR&MC), genetic algorithms (GA), decision trees (DT) and principal component analysis (PCA). These methods, along with new proposed variations, are assessed in terms of classification accuracy, the number of selected features, and computation time. Data collected from the triaxial accelerometers attached to the ankles of individuals with PD, geriatrics (GE), and healthy elderly (HE) were used to train and test a set of six different machine learning techniques. Our results indicate that three out of six feature selection methods, including GA, MSNR&MC, and a modified version of MIGMC are the best performers regarding the classification accuracy. We also show that higher degrees of robust performances are achieved when employing multiple algorithms, such as decision trees and genetic algorithms. This study provides a critical first step towards the much-needed goal of utilizing data collected from wearable devices to extract important information for the diagnosis and rehabilitation of many movement-related medical conditions.

  • Research Article
  • Cite Count Icon 19
  • 10.1109/tkde.2021.3102120
Interactive Reinforcement Learning for Feature Selection with Decision Tree in the Loop
  • Jan 1, 2021
  • IEEE Transactions on Knowledge and Data Engineering
  • Wei Fan + 5 more

We study the problem of balancing effectiveness and efficiency in automated feature selection. After exploring many feature selection methods, we observe a computational dilemma: 1) traditional feature selection is mostly efficient, but difficult to identify the best subset; 2) the emerging reinforced feature selection automatically navigates to the best subset, but is usually inefficient. Can we bridge the gap between effectiveness and efficiency under automation? Motivated by this dilemma, we aim to develop a novel feature space navigation method. In our preliminary work, we leveraged interactive reinforcement learning to accelerate feature selection by external trainer-agent interaction. In this journal version, we propose a novel interactive and closed-loop architecture to simultaneously model interactive reinforcement learning (IRL) and decision tree feedback (DTF). Specifically, IRL is to create an interactive feature selection loop and DTF is to feed structured feature knowledge back to the loop. First, the tree-structured feature hierarchy from decision tree is leveraged to improve state representation. In particular, we represent the selected feature subset as an undirected graph of feature-feature correlations and a directed tree of decision features. We propose a new embedding method capable of empowering graph convolutional network to jointly learn state representation from both the graph and the tree. Second, the tree-structured feature hierarchy is exploited to develop a new reward scheme. In particular, we personalize reward assignment of agents based on decision tree feature importance. In addition, observing agents' actions can be feedback, we devise another reward scheme, to weigh and assign reward based on the feature selected frequency ratio in historical action records. Finally, we present extensive experiments on real-world datasets to show the improved performance.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/otcon56053.2023.10113995
A Comparison of Feature Selection Approaches for Liver Disease Data
  • Feb 8, 2023
  • Ayushi Pillay + 3 more

Efficient feature selection (FS) methods are often needed to correctly classify multiple types of diseases, recognize disease symptoms, and enhance treatment modalities. The aim of this study is to compare the filter and wrapper FS approaches and to show how they enhance classification. First, on disease data, the FS approach has been employed with different classification models. We compared the effectiveness of two FS techniques: recursive feature elimination (RFE) and chi-square feature (CSF) selection. Logistic Regression (LR), Support Vector Machine (SVM), and Decision Tree are three different models (DT) that are used. Performance in classification is evaluated using K-fold crossvalidation. The methods are evaluated on datasets of diseases that are openly accessible. The findings showed that FS is important for accurate disease classification. A subset of the most prevalent characteristics is obtained after examining the relationship between particular symptoms and disease. Thus, by using FS that selects only a few disease predictors, the classification accuracy can be significantly improved.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 18
  • 10.3390/app13106138
Explainable Mortality Prediction Model for Congestive Heart Failure with Nature-Based Feature Selection Method
  • May 17, 2023
  • Applied Sciences
  • Nusrat Tasnim + 4 more

A mortality prediction model can be a great tool to assist physicians in decision making in the intensive care unit (ICU) in order to ensure optimal allocation of ICU resources according to the patient’s health conditions. The entire world witnessed a severe ICU patient capacity crisis a few years ago during the COVID-19 pandemic. Various widely utilized machine learning (ML) models in this research field can provide poor performance due to a lack of proper feature selection. Despite the fact that nature-based algorithms in other sectors perform well for feature selection, no comparative study on the performance of nature-based algorithms in feature selection has been conducted in the ICU mortality prediction field. Therefore, in this research, a comparison of the performance of ML models with and without feature selection was performed. In addition, explainable artificial intelligence (AI) was used to examine the contribution of features to the decision-making process. Explainable AI focuses on establishing transparency and traceability for statistical black-box machine learning techniques. Explainable AI is essential in the medical industry to foster public confidence and trust in machine learning model predictions. Three nature-based algorithms, namely the flower pollination algorithm (FPA), particle swarm algorithm (PSO), and genetic algorithm (GA), were used in this study. For the classification job, the most widely used and diversified classifiers from the literature were used, including logistic regression (LR), decision tree (DT) classifier, the gradient boosting (GB) algorithm, and the random forest (RF) algorithm. The Medical Information Mart for Intensive Care III (MIMIC-III) dataset was used to collect data on heart failure patients. On the MIMIC-III dataset, it was discovered that feature selection significantly improved the performance of the described ML models. Without applying any feature selection process on the MIMIC-III heart failure patient dataset, the accuracy of the four mentioned ML models, namely LR, DT, RF, and GB was 69.9%, 82.5%, 90.6%, and 91.0%, respectively, whereas with feature selection in combination with the FPA, the accuracy increased to 71.6%, 84.8%, 92.8%, and 91.1%, respectively, for the same dataset. Again, the FPA showed the highest area under the receiver operating characteristic (AUROC) value of 83.0% with the RF algorithm among all other algorithms utilized in this study. Thus, it can be concluded that the use of feature selection with FPA has a profound impact on the outcome of ML models. Shapley additive explanation (SHAP) was used in this study to interpret the ML models. SHAP was used in this study because it offers mathematical assurances for the precision and consistency of explanations. It is trustworthy and suitable for both local and global explanations. It was found that the features that were selected by SHAP as most important were also most common with the features selected by the FPA. Therefore, we hope that this study will help physicians to predict ICU mortality for heart failure patients with a limited number of features and with high accuracy.

  • Conference Article
  • 10.65286/icic.v21i3.25735
A Decision Tree Based On Related Family
  • Jan 1, 2025
  • Wenxing Li

Decision trees are widely used supervised learning models known for their simplicity, interpretability, and effectiveness in classification and regression tasks. Feature selection can remove redundant and noisy features, enhancing the generalization and robustness of decision trees. However, due to the high computational cost of existing feature selection methods, it is typically applied only once before classifier training, providing the classifier with dimensionally reduced data. This limits the synergistic effect between feature selection and the construction of split nodes in decision trees. The Related Family is an efficient feature evaluation method proposed by our research team. Its efficiency allows us to use it in the construction of split nodes in decision trees, leading to better splitting criteria. Building on this method, We introduce the Dynamic Related Family Decision Tree DRFDT , which dynamically selects optimal features for each sample subgroup as the tree grows. Experiments demonstrate that DRFDT outperforms a wide range of classification algorithms across 15 UCI datasets, achieving an average accuracy of 89.30 . This represents significant improvements over classical single-feature decision tree methods CART: _3.87 , traditional classification algorithms KNN: _5.71 , SVM: _4.54 , multi-feature split decision tree algorithms CART-LC: _3.99 , O1: _4.25 , and state-of-the-art decision tree classification algorithms FGBDT: _4.88 , MPRBC: _4.77 , RSLRS: _26.84 .

  • Conference Article
  • Cite Count Icon 14
  • 10.1109/igarss.2006.48
Random Feature Selection for Decision Tree Classification of Multi-temporal SAR Data
  • Jul 1, 2006
  • B Waske + 2 more

The accuracy of supervised land cover classifications depends on variables like the chosen algorithm, adequate training data and the selection of features. It has been shown that classification results can be improved by classifier ensembles. In the present study decision trees have been generated with random selections of all available features and combined into such a multiple classifier. The influence of the number of selected features and the size of the multiple classifiers on classification accuracy is investigated using a set of 14 SAR images. Results of multiple classifiers are always better than those of a decision tree based on all available features. Maximum accuracies were achieved with multiple classifiers that use decision trees based on 70% of the available features. The visual inspection of produced maps underlines the high quality of the results. The area is classified into homogeneous fields with little noise, only.

  • Research Article
  • Cite Count Icon 39
  • 10.1162/evco_a_00102
A Scalable Memetic Algorithm for Simultaneous Instance and Feature Selection
  • Aug 8, 2013
  • Evolutionary Computation
  • Nicolás García-Pedrajas + 2 more

Instance selection is becoming increasingly relevant due to the huge amount of data that is constantly produced in many fields of research. At the same time, most of the recent pattern recognition problems involve highly complex datasets with a large number of possible explanatory variables. For many reasons, this abundance of variables significantly harms classification or recognition tasks. There are efficiency issues, too, because the speed of many classification algorithms is largely improved when the complexity of the data is reduced. One of the approaches to address problems that have too many features or instances is feature or instance selection, respectively. Although most methods address instance and feature selection separately, both problems are interwoven, and benefits are expected from facing these two tasks jointly. This paper proposes a new memetic algorithm for dealing with many instances and many features simultaneously by performing joint instance and feature selection. The proposed method performs four different local search procedures with the aim of obtaining the most relevant subsets of instances and features to perform an accurate classification. A new fitness function is also proposed that enforces instance selection but avoids putting too much pressure on removing features. We prove experimentally that this fitness function improves the results in terms of testing error. Regarding the scalability of the method, an extension of the stratification approach is developed for simultaneous instance and feature selection. This extension allows the application of the proposed algorithm to large datasets. An extensive comparison using 55 medium to large datasets from the UCI Machine Learning Repository shows the usefulness of our method. Additionally, the method is applied to 30 large problems, with very good results. The accuracy of the method for class-imbalanced problems in a set of 40 datasets is shown. The usefulness of the method is also tested using decision trees and support vector machines as classification methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 25
  • 10.3390/sym13112166
Software Defect Prediction Using Wrapper Feature Selection Based on Dynamic Re-Ranking Strategy
  • Nov 12, 2021
  • Symmetry
  • Abdullateef Oluwagbemiga Balogun + 9 more

Finding defects early in a software system is a crucial task, as it creates adequate time for fixing such defects using available resources. Strategies such as symmetric testing have proven useful; however, its inability in differentiating incorrect implementations from correct ones is a drawback. Software defect prediction (SDP) is another feasible method that can be used for detecting defects early. Additionally, high dimensionality, a data quality problem, has a detrimental effect on the predictive capability of SDP models. Feature selection (FS) has been used as a feasible solution for solving the high dimensionality issue in SDP. According to current literature, the two basic forms of FS approaches are filter-based feature selection (FFS) and wrapper-based feature selection (WFS). Between the two, WFS approaches have been deemed to be superior. However, WFS methods have a high computational cost due to the unknown number of executions available for feature subset search, evaluation, and selection. This characteristic of WFS often leads to overfitting of classifier models due to its easy trapping in local maxima. The trapping of the WFS subset evaluator in local maxima can be overcome by using an effective search method in the evaluator process. Hence, this study proposes an enhanced WFS method that dynamically and iteratively selects features. The proposed enhanced WFS (EWFS) method is based on incrementally selecting features while considering previously selected features in its search space. The novelty of EWFS is based on the enhancement of the subset evaluation process of WFS methods by deploying a dynamic re-ranking strategy that iteratively selects germane features with a low subset evaluation cycle while not compromising the prediction performance of the ensuing model. For evaluation, EWFS was deployed with Decision Tree (DT) and Naïve Bayes classifiers on software defect datasets with varying granularities. The experimental findings revealed that EWFS outperformed existing metaheuristics and sequential search-based WFS approaches established in this work. Additionally, EWFS selected fewer features with less computational time as compared with existing metaheuristics and sequential search-based WFS methods.

  • Research Article
  • 10.3390/signals6030042
Decision Tree and ANOVA as Feature Selection from Vibration Signals to Improve the Diagnosis of Belt Conveyor Idlers
  • Aug 13, 2025
  • Signals
  • João L L Soares + 7 more

This study aims to compare decision tree and Analysis of Variance (ANOVA) techniques as feature selection methods, combined with Wavelet Packet Decomposition (WPD) for feature extraction, to enhance the diagnosis of faults in belt conveyor idlers. Belt conveyors are widely used in mining for efficient transport, but idlers composed of rollers are frequently subject to failure, making continuous monitoring essential to ensure reliability. Automated diagnostic solutions using vibration signals and machine learning rely on signal processing for feature extraction, often requiring dimensionality reduction or feature selection to improve classification accuracy. Due to the limitations of traditional techniques such as Principal Component Analysis (PCA) in handling temporal variations, Decision Tree and ANOVA emerge as effective alternatives for feature selection. This framework applied to each feature selection method, and Support Vector Machine (SVM) was used as a classification technique. The diagnostic performance of each method, including the case without feature selection, was evaluated. The results showed a higher diagnostic accuracy performance for the approaches that applied the features from the decision tree and from ANOVA. The improvement in the diagnosis of roller failures with feature selection was corroborated with the hit rates of failure mode, severity level, and location of a defective roller above 93.5%.

  • Research Article
  • 10.3389/fdata.2025.1686479
Enhanced SQL injection detection using chi-square feature selection and machine learning classifiers
  • Nov 19, 2025
  • Frontiers in Big Data
  • Emanuel Casmiry + 2 more

In the face of increasing cyberattacks, Structured Query Language (SQL) injection remains one of the most common and damaging types of web threats, accounting for over 20% of global cyberattack costs. However, due to its dynamic and variable nature, the current detection methods often suffer from high false positive rates and lower accuracy. This study proposes an enhanced SQL injection detection using Chi-square feature selection (FS) and machine learning models. A combined dataset was assembled by merging a custom dataset with the SQLiV3.csv file from the Kaggle repository. A Jensen–Shannon Divergence (JSD) analysis revealed moderate domain variation (overall JSD = 0.5775), with class-wise divergence of 0.1340 for SQLi and 0.5320 for benign queries. Term Frequency-Inverse Document Frequency (TF-IDF) was used to convert SQL queries into feature vectors, followed by the Chi-square feature selection to retain the most statistically significant features. Five classifiers, namely multinomial Naïve Bayes, support vector machine, logistic regression, decision tree, and K-nearest neighbor, were tested before and after feature selection. The results reveal that Chi-square feature selection improves classification performance across all models by reducing noise and eliminating redundant features. Notably, Decision Tree and K-Nearest Neighbors (KNN) models, which initially performed poorly, showed substantial improvements after feature selection. The Decision Tree improved from being the second-worst performer before feature selection to the best classifier afterward, achieving the highest accuracy of 99.73%, precision of 99.72%, recall of 99.70%, F1-score of 99.71%, a false positive rate (FPR) of 0.25%, and a misclassification rate of 0.27%. These findings highlight the crucial role of feature selection in high-dimensional data environments. Future research will investigate how feature selection impacts deep learning architectures, adaptive feature selection, incremental learning approaches, robustness against adversarial attacks, and evaluate model transferability across production web environments to ensure real-time detection reliability, establishing feature selection as a vital step in developing reliable SQL injection detection systems.

  • Research Article
  • Cite Count Icon 1
  • 10.11591/eei.v14i1.8421
Optimizing earthquake damage prediction using particle swarm optimization-based feature selection
  • Feb 1, 2025
  • Bulletin of Electrical Engineering and Informatics
  • Nurul Anisa Sri Winarsih + 5 more

Earthquakes have destroyed the economy and killed many people in many countries. Emergency response actions immediately after an earthquake significantly reduce economic losses and save lives, so accurate earthquake damage predictions are needed. This research looks at how machine learning (ML) techniques are used to predict damage from earthquakes. The ML algorithms used are k-nearest neighbors (KNN), decision tree (DT), random forest (RF), and Naïve Bayes (NB). Feature selection is necessary, it needs to select the most relevant features from big data. One of the most commonly used algorithms to optimize ML is particle swarm optimization (PSO). PSO is also suitable for feature selection. This research compares various of PSO. Based on research, the RF algorithm with Phasor PSO has the highest fitness score. This process succeeded in reducing features from 38 features to 14 features. Based on the process after feature selection, it was found that the KNN, DT, and RF algorithms had improved. RF obtained the best accuracy, namely 72.989%. The processing time in DT, RF, and NB is faster than before. In conclusion, the ML algorithm can be combined with PSO feature selection to create a classification model that provides better performance than without feature selection.

  • Research Article
  • Cite Count Icon 7
  • 10.3233/ifs-2009-0421
Application of multi-class support vector machines for power system on-line static security assessment using DT - based feature and data selection algorithms
  • Jan 1, 2009
  • Journal of Intelligent & Fuzzy Systems
  • M Mohammadi + 1 more

This paper presents a multi-class Support Vector Machine (SVM) based algorithm for on-line static security assessment of the power systems. The proposed SVM based security assessment algorithm has a very small training time and space in comparison with the traditional machine learning methods such as Artificial Neural Networks (ANN) based algorithms. In addition, the proposed algorithm is faster than existing algorithms. One of the main points, to apply a machine learning method is feature selection. In this paper, a new Decision Tree (DT) based feature selection algorithm has been presented. The proposed SVM algorithm has been applied to New England 39-bus power system. The simulation results show the effectiveness and the stability of the proposed method for on-line static security assessment. The effectiveness of the proposed feature selection algorithm has been investigated, too. The proposed feature selection algorithm has been compared with different feature selection algorithms. The simulation results demonstrate the effectiveness of the proposed feature algorithm.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.