Abstract
Gene clustering is a common methodology for analyzing similar data based on expression trajectories. Clustering algorithms in general need the number of clusters as a priori, and this is mostly hard to estimate, even by domain experts. In this paper, we use Niched Pareto k-means Genetic Algorithm (GA) for clustering m-RNA data. After running the multi-objective GA, we get the pareto-optimal front that gives alternatives for the optimal number of clusters as a solution set. We analyze the clustering results under two cluster validity techniques commonly cited in the literature, namely DB index and SD index. This gives an idea about ranking the optimal numbers of clusters for each validity index. We tested the proposed clustering approach by conducting experiments using three data sets, namely figure2data, cancer (NCI60) and Leukaemia data. The obtained results are promising; they demonstrate the applicability and effectiveness of the proposed approach.Keywordsmulti-objective genetic algorithmclusteringvalidity analysisgene expression data analysis
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have