IDENTIFYING IMPORTANT GENES IN OVARIAN CANCER FROM HIGH-DIMENSIONAL MICROARRAY DATA USING SIFS-CART METHOD

Ni Kadek Emik Sapitri,Umu Sa'Adah,Nur Shofianah

doi:10.30598/barekengvol18iss3pp1909-1918

Ni Kadek Emik Sapitri, Umu Sa'Adah + Show 1 more

Open Access

https://doi.org/10.30598/barekengvol18iss3pp1909-1918

Copy DOI

Journal: BAREKENG: Jurnal Ilmu Matematika dan Terapan	Publication Date: Jul 31, 2024
License type: CC BY-SA 4.0

Abstract

Ovarian cancer can be identified from microarray data using machine learning. Many studies only focus on improving the machine learning classification algorithms to achieve higher performance. The purpose of classification is not only to obtain high performance but also to seek new knowledge from the results. This research focuses on both. By using a hybrid Supervised Infinite Feature Selection (SIFS) method with Classification and Regression Tree (CART) or SIFS-CART, this research aims to predict ovarian cancer and identify potential genes for ovarian cancer cases. The data used is the OVA_ovary dataset. SIFS in the best SIFS-CART model reduced 10935 genes in the initial OVA_ovary dataset to 1000 genes. Then, CART was built with these 1000 genes. Based on the balanced accuracy (BA) metric for imbalanced microarray data, the best SIFS-CART model achieves 85.7% BA in training and 83.2% in testing. The optimal CART in the best SIFS-CART model only needs four genes from 1000 selected genes to build it. Those genes are STAR, WT1, PEG3, and ASPN. Based on studies of several pieces of literature in the medical field, it can be concluded that STAR, WT1, and PEG3 play an important role in ovarian cancer cases. However, the relationship between ASPN and ovarian cancer in more detail has not been studied by medical researchers.

Full Text