Abstract

The classification method in data mining requires a good learning process to get optimal accuracy. This can be done if the dataset used is ideal, balanced, and has a lot of records, but in reality, it is difficult to get such a dataset. The imputation method is one way to fill in missing values, in a dataset that is not ideal. A large number of missing values can reduce the number of records in the learning process and affect accuracy. This research aims to analyze the effects of zero and mean imputation methods on classification accuracy in small datasets using the Naïve Bayes classifier (NBC) and NBC which have been optimized with Particle Swarm Optimization (PSO). Tests were carried out on five types of datasets originating from the UCI database, where one of the datasets did not require an imputation method because it did not have a missing value. Based on the results of the PSO testing proven to be able to improve the accuracy of the NBC classification on all datasets. While the imputation method can improve classification accuracy up to 4.33% in Biomarker datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.