Cancer Classification Based on the Features of Itemset Sequence Pattern of TP53 Protein Code Using Deep Miden - KNN

Imam Cholissodin,Edy Santoso,Dian Eka Ratnawati,Nurul Hidayat,Marji Marji

doi:10.25126/jitecs.202271401

Abstract

Cancer is a disease that is still difficult to identify up to today. One of the causes of cancer is genetic modification that because of mutations in p53 gene. Healthy cells have a p53 wild type protein (normal) that is able to manage DNA separation. If DNA mutates, it will be difficult to detect cancer because the composition of the protein has changed. Bioinformatics is a combination of biology and information engineering (TI) that is utilized to manage data. One of the applications of data mining in bioinformatics is the development of pharmaceutical and medical industries. Data mining classification can use variety of methods including K-Nearest Neighbor (KNN), C45, ID3, and several other methods. One of the most reliable data classification methods is KNN. In this study, the development used two algorithms. The first was with the modification of the k-fold method, which divided two data into training data and test data, in which test-1 data and test-2 data were made into slices. The second was by a method for selecting an itemset sequence pattern that had the largest Gain Information, either 2 itemsets, 3 itemsets, and so on (Deep Miden). The best accuracy result of 96.00% was obtained through the process of computation testing in the server based on variations in terms of the number of patterns of Deep Miden itemset sequences and several k values on KNN classification method.

Full Text