Feature engineering is a branch of science that provides tools to support, for example, the preparation of feature spaces for a pattern recognition task. The present work focuses on the problem of feature extraction. The proposed model is based on the mechanisms of PCA principal component analysis. It fills a gap in the implementation of feature extraction by looking for spaces that best discriminate between classes. This was realized by rotating the features according to the centroids of the classes. In addition, a measure of their consistency was determined which allows precise estimation of the number of features for a particular component. Four experiments were conducted in this study. The first two were done on synthetic datasets, while the next two were conducted on ten real datasets. The synthetic data allowed to determine the characteristics depending on the percentage of informative features, the number of input features, the level of imbalance and the number of output components in the extraction task. The obtained results showed that the developed solution allows for a more precise extraction, thus increasing the quality of classification. Moreover, it was shown that the method based on class centroids allows to construct efficient ensembles of classifiers.
Read full abstract