Abstract

Unsupervised feature selection (UFS) is a popular technique of reducing the dimensions of high-dimensional data. Previous UFS methods were often designed with the assumption that the whole information in the data set is observed. However, incomplete data sets that contain unobserved information can be often found in real applications, especially in industry. Thus, these existing UFS methods have a limitation on conducting feature selection on incomplete data. On the other hand, most existing UFS methods did not consider the sample importance for feature selection, i.e., different samples have various importance. As a result, the constructed UFS models easily suffer from the influence of outliers. This article investigates a new UFS method for conducting UFS on incomplete data sets to investigate the abovementioned issues. Specifically, the proposed method deals with unobserved information by using an indicator matrix to filter it out the process of feature selection and reduces the influence of outliers by employing the half-quadratic minimization technique to automatically assigning outliers with small or even zero weights and important samples with large weights. This article further designs an alternative optimization strategy to optimize the proposed objective function as well as theoretically and experimentally prove the convergence of the proposed optimization strategy. Experimental results on both real and synthetic incomplete data sets verified the effectiveness of the proposed method compared with previous methods, in terms of clustering performance on the low-dimensional space of the high-dimensional data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.