Abstract

Cancer is curable if it can be detected early. One way to detect cancer is by analyzing the change in expression of genes in the suspected tissue. Serial analysis of gene expression (SAGE) is a sequencing technique used for measuring the expression levels of genes. Cancer detection problem can be posed as binary classification problem like whether a tissue is cancerous or normal. SAGE libraries contain expression levels of thousands of genes which are the features. It is impossible to consider all these features for classification and also the general feature selection algorithms are not efficient with this data. In this paper, closed frequent itemset mining is proposed as a feature selection technique for identifying a small set of features which can distinguish the two classes efficiently. The performance of the proposed technique is evaluated on SAGE data related to breast tissue and a group of 26 genes are selected as best features. Two well known classifiers, extreme learning machine (ELM) and support vector machine (SVM), are used to evaluate the effectiveness of the selected features in classification and found that the proposed method works well with these classifiers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call