Abstract

In the era of Big Data, cluster analysis of high-dimensional data sets often suffers from the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Curse of dimensionality</i> . To overcome this problem, the dimensionality reduction through <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">feature selection</i> becomes inevitable. Co-clustering or two-way clustering is considered to be a more sophisticated tool than conventional one-way clustering. Moreover, the advent of multi-view learning shows that the subjects of a data set can be interpreted in many ways. Interestingly, a minimal number of existing feature selection algorithms take advantage of the co-clustering method and are designed to consider multi-view data. Motivated by this, in the current article, we propose a feature (gene) selection method for high dimensional gene expression (GE) data through a <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m</b> ulti-objective optimization based <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m</b> ulti-view <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Co</b> <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"/> <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><b>-Clus</b></i> <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"/> tering algorithm (named <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MMCo-</b> <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><b>Clus</b></i> <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"/> ). A popular evolutionary technique – Non-dominated Sorting Genetic Algorithm-II (NSGA-II) has been utilized as the proposed method's underlying optimization strategy. First, we construct two views of a chosen data set, utilizing knowledge from two different biological data sources. Next, we develop the MMCo- <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Clus</i> algorithm considering the constructed views to identify a set of “good” co-clustering solutions. Finally, based on a concept of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">consensus operation</i> on the co-clustering outcome, a small number of most relevant and non-redundant features are extracted from the original feature-space. The reduced dimension formed by new feature-space causes to decrease the computational burden and noise level of original data. For experimental analysis, we have chosen three benchmark GE data sets. Our feature selection method's effectiveness is evaluated through sample-classification accuracy, accompanied by the cluster profile plot/Eisen plot/t-SNE plot, and biological/statistical significance test. A thorough comparative analysis with existing feature selection algorithms using external and internal evaluation metrics supports our proposed method's potency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call