Abstract

Data mining is a data extraction process with large dimensions and information with the aim of obtaining information as knowledge to make decisions. Problems in the data mining process often occur in high-dimensional data processing. The solution to handling problems in high-dimensional data is to apply the hybrid genetic algorithm and particle swarm optimization (HGAPSO) method to improve the performance of the C5.0 decision tree classification model to make decisions quickly, precisely and accurately on classification data. In this study, there were 3 datasets sourced from the University of California, Irvine (UCI) machine learning repositories, namely lymphography, vehicle, and wine. The HGAPSO algorithm combined with the C5.0 decision tree testing method has the optimal accuracy for processing highdimensional data. The lymphography and vehicle data obtained an accuracy of 83.78% and 71.54%. The wine dataset has an accuracy of 0.56% lower than the conventional method because the data dimensions are smaller than the lymphography and vehicle dataset.

Highlights

  • Data mining is an extraction process of data with large dimension to obtain information as knowledge to make decisions

  • This research is a solution for dealing with problems in high dimensional data by using the hybrid genetic algorithm and particle swarm optimization methods to improve the performance of the decision tree classification model C5.0 to make decisions quickly, precisely and accurately on classification data

  • The test results using the genetic algorithm (GA) particle swarm optimization (PSO) hybrid method on the C5.0 decision tree using the hybrid method on the sympography and vehicle dataset can increase the accuracy higher than the conventional method, namely at 83.78% and 71.54%

Read more

Summary

INTRODUCTION

Data mining is an extraction process of data with large dimension to obtain information as knowledge to make decisions. Some methods that are often used to solve problems in high dimensional data are the genetic algorithm (GA) and the particle swarm optimization (PSO) [2]. Other research uses hybrid genetic algorithm and particle swarm optimization methods to solve bi-level linear programming problems. Research that has been carried out by conventional optimization techniques is considered not optimal in dealing with feature selection problems on high dimensional data, it is necessary to optimize feature selection in the decision tree C5.0 method by maximizing the application of the model in preprocessing [6]. This research is a solution for dealing with problems in high dimensional data by using the hybrid genetic algorithm and particle swarm optimization methods to improve the performance of the decision tree classification model C5.0 to make decisions quickly, precisely and accurately on classification data

MATERIAL AND METHODS
Research Methods
Lymphography 148
Testing Method
AND DISCUSSION
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call