Abstract

Discretization is an important data preprocessing technique used in data mining and knowledge discovery processes. The purpose of discretization is to transform or partition continuous values into discrete ones. In this manner, many data mining classification algorithms can be applied the discrete data more concisely and meaningfully than continuous ones, resulting in better performance. In this study, an improved version of the unsupervised equal frequency (EF) discretization method, EF_Unique, is proposed for enhancing the performance of discretization. The proposed EF_Unique discretization method is based on the unique values of the attribute to be discretized. In order to test the success of the proposed method, 17 benchmark datasets from the UCI repository and four data mining classification algorithms were used, namely Naive Bayes, C.45, k-nearest neighbor, and support vector machine. The experimental results of the proposed EF_Unique discretization method were compared with those obtained using well-known discretization methods; unsupervised equal width (EW), EF, and supervised entropy-based ID3 (EB-ID3). The results show that the proposed EF_Unique discretization method outperformed EW, EF, and EB-ID3 discretization methods in 43, 41, and 27 out of the 68 benchmark tests, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.