Abstract

Abnormal growth in cells with the potential to diffuse to other parts of the human body could occur due to multiple reasons such as changes in DNA segments activity. Altering DNA methylation is known as an important factor in cancer development and altering DNA activity by avoiding some of the normal activities of DNA. Feature extraction is used to reduce the dimensionality in high dimensional datasets as well as to filter the most useful features in predicting gene expression for a cancer. A number of feature extraction methods have been used in literature for selecting the most useful features. In this study Semi-orthogonal Non-Negative Matrix Factorization (SONMF) and Non-negative Matrix Factorization (NMF) were studied and tested on four microarray cancer datasets for feature extraction and compared with FFT features, Symmetry of Methylation Density Features, and Principal Component Analysis (PCA). Five different classifiers, namely Naive Bayes, Support Vector Machine (SVM), K-nearest Neighbor (KNN), Random Forest and Neural Network were used to predict the gene expression of the four cancer microarray datasets. The experiments show that for colon cancer dataset, Semi-orthogonal NMF (SONMF) and Non-negative Matrix Factorization (NMF) performed the best compared with other feature extraction methods with Naive Bayes classifier. For Oral cancer dataset, the highest accuracy was observed with SONMF and Neural Network classifier. In Leukemia cancer, the highest accuracy of 100% was observed with NMF, SONMF and PCA with Neural Network and SVM classifiers. For prostate cancer dataset, SONMF with Naive Bayes classifier gave the highest accuracy. Overall, the results show that SONMF and NMF were more consistent compared with other features extraction methods and gave the best features for prediction accuracy of microarray cancer datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call