The recent increase in applications of high-dimensional data poses a severe challenge to data analytics, such as supervised classification, particularly for online applications. To tackle this challenge, efficient and effective methods for feature extraction are critical to the performance of classification analysis. The objective of this work is to develop a new supervised feature extraction method for high-dimensional data. It is achieved by developing a clustered discriminant regression (CDR) to extract informative and discriminant features for high-dimensional data. In CDR, the variables are clustered into different groups or subspaces, within which feature extraction is performed separately. The CDR algorithm, which is a greedy approach, is implemented to obtain the solution toward optimal feature extraction. One numerical study is performed to demonstrate the performance of the proposed method for variable selection. Three case studies using healthcare and additive manufacturing data sets are accomplished to demonstrate the classification performance of the proposed methods for real-world applications. The results clearly show that the proposed method is superior over the existing method for high-dimensional data feature extraction. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —This article forwards a new supervised feature extraction method termed clustered discriminant regression. This method is highly effective for classification analysis of high-dimensional data, such as images or videos, where the number of variables is much larger than the number of samples. In our case studies on healthcare and additive manufacturing, the performance of classification analysis based on our method is superior over the existing feature extraction methods, which is confirmed by using various popular classification algorithms. For image classification, our method with elaborately selected classification algorithms can outperform a convolutional neural network. In addition, the computation efficiency of the proposed method is also promising, which enables its online applications, such as advanced manufacturing process monitoring and control.
Read full abstract