Abstract

Principal component analysis (PCA) is one the most effective and widely used dimensionality-reduction techniques which aggregates the maximal variance in the first few components. In many applications, the first few components of PCA are used in place of the original variables in statistical and machine learning models; nevertheless, such use does not guarantee the selection of the most relevant components to the target variable. This paper presents an efficient approach in which all components, rather than simply the first few, are considered as input for the random forest (RF) model, and a feature selection algorithm is integrated with the RF to select the most relevant components, hence called fully component selection (FCS). The proposed method was evaluated on spectroscopic data comprising 70 soil samples with 2050 spectral bands to estimate soil organic carbon (SOC). The FCS approach was compared to the RF model once considering the original features, then again using only the first few components, and again using all components without feature selection. The results showed that RF-FCS substantially outperformed other approaches, such that the R2 was increased between 25.2% and 55.5%. Furthermore, the findings indicated that the most relevant principal components (PCs) to the target variable were not the first few, but PC6, PC7, PC8, PC15, and PC43. Using the FCS approach increased model performance substantively, and coupling it with machine learning and statistical models is strongly recommended for high dimensional data applications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.