Unsupervised Feature Selection Using an Integrated Strategy of Hierarchical Clustering With Singular Value Decomposition: An Integrative Biomarker Discovery Method With Application to Acute Myeloid Leukemia.

Tapas Bhadra,Amir Sohel,Zhongming Zhao,Saurav Mallik

doi:10.1109/tcbb.2021.3110989

Abstract

In this article, we propose a novel unsupervised feature selection method by combining hierarchical feature clustering with singular value decomposition (SVD). The proposed algorithm first generates several feature clusters by adopting the hierarchical clustering on the feature space and then applies SVD to each of these feature clusters to find out the feature that contributes most to the SVD-entropy. The proposed feature selection method selects an optimal feature subset that not only minimizes the mutual dependency among the selected features but also maximizes the mutual dependency of the selected features against their nearest neighbor non-selected features to some extent. Each of the selected features also contributes the maximum SVD-entropy among all features of the same feature cluster. The experimental results demonstrate that the proposed algorithm performs well against several state-of-the-art methods of feature selection in terms of various evaluation criteria such as classification accuracy, redundancy rate, and representation entropy. The superiority of the proposed algorithm is demonstrated through analysis of Acute Myeloid Leukemia (AML) multi-omics data that consist of five datasets: gene expression, exon expression, methylation, microRNA, and pathway activity dataset (paradigm IPLs) from The Cancer Genome Atlas (TCGA). Our analysis pinpoints a candidate gene-marker, EREG for AML with an integrative omics evidence. EREG is targeted by two top ranked microRNAs, hsa-miR-1286 and hsa-miR-1976, here in the datasets. The method and results will be useful for biomarker discovery in the era of in precision medicine.

Full Text