Abstract

In recent years, gene expression data analysis has gained growing significance in the fields of machine learning and computational biology. Typically, microarray gene datasets exhibit a scenario where the number of features exceeds the number of samples, resulting in an ill-posed and underdetermined equation system. The presence of redundant features in high-dimensional data leads to suboptimal performance and increased computational time for learning algorithms. Although feature extraction and feature selection are two approaches that can be employed to deal with this challenge, feature selection has greater interpretability ability which causes it to receive more attention. In this study, we propose an unsupervised feature selection which is based on pseudo label latent representation learning and perturbation theory. In the first step, pseudo labels are extracted and constructed using latent representation learning. In the second step, the least square problem is solved for original data matrix and perturbed data matrix. Features are clustered based on the similarity between the original data matrix and the perturbed data matrix using k-means. In the last step, features in each subcluster are ranked based on information gain criterion. To showcase the efficacy of the proposed approach, numerical experiments were carried out on six benchmark microarray datasets and two RNA-Sequencing benchmark datasets. The outcomes indicate that the proposed technique surpasses eight state-of-the-art unsupervised feature selection methods in both clustering accuracy and normalized mutual information.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.