Unsupervised feature extraction is crucial in machine learning and data mining for handling high-dimensional and unlabeled data. However, existing methods often ignore feature relationaships, resulting in suboptimal feature subsets. This paper reviews the current state of unsupervised feature extraction methods, discussing the limitations of traditional methods such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA), particularly in terms of interpretability, sensitivity to outliers, and computational resource challenges. In recent years, improvement strategies such as information theory, sparse learning, and deep learning (e.g., deep autoencoders and generative adversarial networks) have significantly progressed in feature extraction. This paper analyzes the practical applications of these methods in image processing, gene analysis, text mining, and network security. For example, in image processing, deep autoencoder-based methods such as Matrix Capsules with EM Routing can effectively extract key features from complex images. In text mining, unsupervised feature selection methods combined with generative adversarial networks significantly improve the efficiency of processing high-dimensional text data. Additionally, this paper explores future research directions such as multimodal data processing, improving real-time processing capabilities, and integration with other machine learning techniques (e.g., reinforcement learning, transfer learning), providing insights for further development of unsupervised feature extraction technologies.
Read full abstract