Abstract

The clustering of high-dimensional predictors draws increasing attention in various scientific areas, such as text mining and biological data analysis. In standard clustering procedures, when predictors are clustered, they only showcase the inherent patterns within the predictor set, lacking the capacity to predict the response variable. To this end, a new supervised weeding algorithm is advocated to address the dual requirement of detecting sparse clusters and capturing the prediction effects. The proposed algorithm is based on an iterative feature screening and coherence evaluation procedure. It iteratively weeds off the unimportant predictors in a backward fashion, forming sequences of nested sets to determine data-driven optimal cut-offs. This study uses Monte Carlo simulation to assess the finite-sample performance of the proposed method. The findings demonstrate that both the clustering and prediction performance of the proposed method are comparable to existing methods that concentrate solely on one aspect of the dual targets. An analysis of a job description dataset is conducted to explore significant groups of keywords that affect employees' salaries.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.