DATA MINING IN ORGANIC GEOCHEMISTRY: CASE STUDY IN POTIGUAR BASIN

Sarah Barrón Torres,Ygor Dos Santos Rocha,Ítalo De Oliveira Matias,Erica Tavares De Morais,Francisco Fábio De Araújo Ponte,Fabiano Galdino Leal,Mario Duncan Rangel

doi:10.5016/geociencias.v41i1.16161

Abstract

The amount of data from geochemical analysis using samples collected in oil wells grows simultaneously to the investment in the exploration and production sector. On the other hand, the treatment and interpretation of these results are still very dependent on experts and demand time. With the generation of extensive databases, data mining presents itself as a good alternative to explore them through statistical methods and computational algorithms, providing technological differential and agility to the system. In an experimental way, with data from 200 oils from the Potiguar Basin, these tools were implemented, with the consequent suggestion of a workflow that would, in the end, return a reasonable accuracy in predicting their genetic classification. Using multidimensional scaling (MDS) and clustering (dendrogram and k-means types) from 60 initial attributes, the optimal set was reduced to 26. Applying Machine Learning, 92.50% of median accuracy were obtained in the Decision Tree algorithm, 95.00% in Random Forest and 87.50% in Artificial Neural Network. Comparing to an analysis previously presented at the pertinent literature, the benefits in terms of efficiency can be realized with the adoption of the methodology herein proposed.  Keywords: Organic geochemistry; Data Mining; Multivariate Statistics; Workflow.

Full Text