Abstract
The selection of a training set is the key to determining the quality of the model. In the spectrum analysis, due to various interference factors, the quality of the collected spectral data of some samples has a serious deviation. If directly used in modeling, it will introduce bias to the establishment of the model. Therefore, to get the most representative samples, it is necessary to select samples before establishing the model. This paper proposes a two-dimensional sample selection (TDSS) method, which selects samples from two angles of spectral data quality and variable correlation. This method and Mahalanobis distance method were respectively applied to dynamic spectrum (DS) data to screen samples. The samples screened by the two methods were used for modeling. Finally, establish partial least squares (PLS) linear regression model with a quadratic nonlinear correction method to predict the target components. The experimental results show that the sample screening method significantly improved the accuracy and prediction performance of the model, and it is better than the Mahalanobis distance method. In the prediction of triglyceride and total cholesterol, the correlation coefficient can reach above 0.82. The experimental results fully prove the effectiveness of the sample selection method in this paper, and it has a remarkable effect on improving the accuracy and robustness of the model. This paper provides a new way for sample selection of modeling set in spectral analysis of complex solutions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.