β-Thalassemia Knowledge Elicitation Using Data Engineering: PCA, Pearson’s Chi Square and Machine Learning

P Paokanta

doi:10.7763/ijcte.2012.v4.561

Abstract

Data Engineering is one of the Knowledge Elicitation and Analysis methods, among serveral techniques; Feature Selection methods play an important role for these processes which are the processes in data mining technique esspecially classification tasks. The filtering process is an important pre-treatment for every classification process. Not only decreasing the computational time and cost, but selecting an appropriate variable is increasing the classification accuracy also. In this paper, the Thalassemia knowledge was elicited using Data engineering techniques (PCA, Pearson's Chi square and Machine Learning). This knowledge presented in form of the comparison of classification performance of machine learning techniques between using Principal Components Analysis (PCA) and Pearson's Chi square for screening the genotypes of β-Thalassemia patients. According to using PCA, the classification results show that the Multi-Layer Perceptron (MLP) is the best algorithm, providing that the percentage of accuracy reaches 86.61, K- Nearest Neighbors (KNN), NaiveBayes, Bayesian Networks (BNs) and Multinomial Logistic Regression with the percentage of accuracy 85.83, 85.04, 85.04 and 82.68. On the other hand, these results were compared to the Pearson's Chi Square and presented that…. In the future, we will search for the other feature selection techniques in order to improve the classification performance such as the hybrid method, filtering mathod etc.

Full Text