Abstract

In the domain of bioinformatics, the clustering of gene expression profiles of different tissue samples over different experimental conditions has gained importance with the invention of micro-array based technology. This study also has some impact on cancer diagnosis. The proper classification of cancer tissue samples generated using the micro-array technology helps in detecting cancers in an automated way. In the current paper we have developed a semi-supervised clustering technique for proper partitioning of these gene expression data sets. Semi-supervised clustering is a combination of unsupervised and supervised classification techniques. It uses some amount of supervised information and a large collection of unsupervised data. Here a multi-objective based semi-supervised clustering technique is developed for solving the cancer tissue classification problem. Different combinations of objective functions are used. As the supervised information we assume that class labels of 10 % data are available. The proposed technique is evaluated for three open source benchmark cancer data sets (brain tumor data set, adult malignancy and small round blood cell tumors). Two classification quality measures, viz., Adjusted Rand Index and Classification Accuracy are used to measure the goodness of the obtained partitionings. Obtained results are compared with several state-of-the-art clustering techniques. Moreover, significant gene markers have been identified and demonstrated visually from the clustering solutions obtained.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call