Abstract

The application of data mining using high dimensional dataset can cause problem in accuracy and time consumption in the classification phase. One of the solutions is to reduce the dimension of the dataset using feature selection. This research aims to select the appropriate feature selection for the classification process and find out whether feature selection can improve accuracy or not. The feature selection methods that are often used are information gain, gain ratio, chi-square, and correlation-based feature selection. This research will be done a comparison of the four feature selection methods techniques to determine whether the feature selection process can always increase the accuracy and reduce the computation time of the classification algorithm or not. According to the research that has been done, applying the four methods can make the accuracy of NB, K-NN, SVM, DT, and ID3 algorithm decreases. However, it reduces the computation time of K-NN, SVM, and ID3 algorithms. Using the four feature selection methods, the most influential attributes are SSLfinal_State, Having_sub_domain,URL_of_Anchor,Preffix_suffix,SFH,Dom ain_registration_length, Links_intags, Web_traffic, Request_url, and Google_index. According to these results, the feature selection process can decrease the accuracy of the classification algorithm. This is due to either the character of the data or the classification algorithm itself. The feature selection process can also reduce computing time so that speed up the working process of the classification algorithm used. This is because of the data dimensions are getting smaller.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.