Abstract

BackgroundDisorders in deoxyribonucleic acid (DNA) mutations are the common cause of colon cancer. Detection of these mutations is the first step in colon cancer diagnosis. Differentiation among normal and cancerous colon gene sequences is a method used for mutation identification. Early detection of this type of disease can avoid complications that can lead to death. In this study, 55 healthy and 55 cancerous genes for colon cells obtained from the national center for biotechnology information GenBank are used. After applying the electron–ion interaction pseudopotential (EIIP) numbering representation method for the sequences, single-level discrete wavelet transform (DWT) is applied using Haar wavelet. Then, some statistical features are obtained from the wavelet domain. These features are mean, variance, standard deviation, autocorrelation, entropy, skewness, and kurtosis. The resulting values are applied to the k-nearest neighbor (KNN) and support vector machine (SVM) algorithms to obtain satisfactory classification results.ResultsFour important parameters are calculated to evaluate the performance of the classifiers. Accuracy (ACC), F1 score, and Matthews correlation coefficient (MCC) are 95%, 94.74%, and 0.9045%, respectively, for SVM and 97.5%, 97.44%, and 0.9512%, respectively, for KNN.ConclusionThis study has created a novel successful system for colorectal cancer classification and detection with the well-satisfied results. The K-nearest network results are the best with low error for the generated classification system, even though the results of the SVM network are acceptable.

Highlights

  • Disorders in deoxyribonucleic acid (DNA) mutations are the common cause of colon cancer

  • Many researchers worldwide have studied cancer, hoping to detect this disease at an early stage so that they could reduce its risk, which often leads to death

  • The electron–ion interaction pseudopotential (EIIP) method was used to convert the DNA sequences from strings into number values so that genomic signal processing (GSP) could be applied in the feature extraction step, and suitable classifiers were selected

Read more

Summary

Results

Three important parameters were calculated for the performance evaluation of proposed method They are Matthews correlation coefficient (MCC), F1 score, and ACC. Discussions The KNN algorithm identified cancer genes and normal genes out of a total of 20 each (TP = 19, FP = 1, TN = 20, and FN = 0), while the SVM network recognized 18 cancer genes and 20 normal genes (TP = 18, FP = 2, TN = 20, and FN = 0) (Table 1) The results of both methods were satisfactory. In comparison, achieving a higher ACC, higher F1 score, and higher MCC is evidence that the classification process is more successful, and the classifier is more effective These results indicate that the classifier can recognize the required target with minimum errors.

Conclusion
Background
Methods
Discrete Wavelet Transform
Statistical Features
K-nearest neighbors
Support vector machines
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.