Abstract
Random Forest (RF) technique has been shown to be promising in the supervised classification applied in different matrices. However, approaches to identifying significant variables that weight the model are scarce, in the classification problems. In this paper, we propose a methodology for the selection of variables of greater relevance in the construction of RF models. For the application of this methodology, classification models were developed to discriminating crude oil samples, about to their maximum pour point (MPP). In this sense, data from MPP (ASTM D5853) of 105 crude oil samples, their hydrogen (1H) NMR spectra and carbon (13C) NMR spectra were acquired. With MPP ranging from −54 °C to 39 °C, two classes were assigned: the first containing 43 samples with MPP value ≤ −9 °C, and, the second, 62 samples with MPP value > −9 °C. The 1H NMR models, with 90% accuracy, and 13C NMR, with 71% accuracy, were used in the selection of variable method. The results showed that the methodology proposed to select variables was effective in the distinction of the variables that best contributed to the discrimination of oils. Therefore, this new tool enabled a greater understanding of the interest chemical information, contained in the spectra and its relationship with the MPP property of the crude oil samples.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have