Abstract

Introduction: During the process of building a predictive data mining module achieving the highest accuracy is major concern by all researchers. Studying the impact of data representation on the performance of classification accuracy is essential. Recent researches travel among classifiers techniques looking for suitable and higher classification accuracy to build strong modules. Adding extra dimensional by focusing on the reflects that data representation might have on the classification accuracy data mining predictive techniques is the ultimate goal of this research.Methods: In this research seven different data representations were performed on several classifier techniques. These representations were AS_IS representation and three from the binary section and three from normalization section. The binary section included simple binary representation, flag representation and thermometer representation while the normalization section included min max normalization, sigmoidal normalization and standard deviation normalization. These seven representations were applied on eight classifiers Neural Network, Logistic Regression, K nearest Neighbor, Support Vector Machine, Classification Tree, Naive Bayesian, Rule based and Random Forest Decision Tree. Moreover, two datasets have been used for testing the performance of classification accuracy, namely Wisconsin Breast Cancer and German Credit and these two datasets have Boolean target class.Results: The fourteen data representations were raised from two datasets Wisconsin Breast Cancer and German Credit with seven different data representations for each. These data representations were performed on several classifier techniques using Orange software. The results achieved showed variation of the performance among all classifier in classification accuracy. Excluding Naive Bayesian which had over 60 % different from the lowest to the highest accuracy, all other classifier techniques had diverging on classification accuracy around 4.2%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.