Abstract

Data Mining refers to as the nontrivial process of deriving and identifying valid, novel, potentially useful and ultimately understandable pattern in data. Data mining can be classified into various models such as Clustering, Decision trees, Association rules, and Sequential pattern and time series. In this work, more emphasis is given on clustering technique to analyses Genetic Expression data under Bioinformatics approach. Innovative technologies like DNA Microarray methodology in experimental molecular biology, has produced huge amounts of valuable data in the profile of gene expression. It is now possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Constant upsurge of experimental data has produced new challenges in terms of maintenance, storage and analysis to derive meaningful patterns. Many clustering algorithms have been proposed for analysis of the gene expression data. However, the evaluation of feasible and applicable clustering algorithms is becoming an important issue in current bioinformatics research. In this article, four clustering algorithms (K-Means, Hierarchical Clustering, Self-Organizing map (SOM) and DBSCAN) have been studied on Iris flower gene expression datasets. The clustering efficiency of each algorithm is accessed by various external and internal clustering evaluation indices. The results generated from this work were further analyzed by plotting graphs and charts across different algorithms, different indices and datasets to analyze the similarity of clusters generated by different algorithms and thereby enable comparisons of different clustering methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call