This paper is concerned with the applicability of different strategies to improve the definition of prior probabilities and/or likelihoods of naïve Bayes (NB) classifiers. Standard NB method computes likelihoods and priors by means of normal distributions and evaluation of the entire training data-set, respectively. NB is one of the most prolific classification methods in data mining and machine learning. Despite decent efficiency facing good training data, NB classifiers present the intriguing assumption of conditional independence between the attributes. Several algorithms have been proposed to improve the effectiveness of NB classifiers by inserting discriminant approaches into its generative structure. From a reliability perspective, the standard NB approach might not explore the real capabilities of NB classifiers facing the lithologic classification problem. To cover such distrust, a novel approach considering four particular strategies are suggested and compared to the standard NB classification outcomes. At first, a kernel density estimation (KDE) is considered to ameliorate the likelihood models. We also apply the NB classifier in parts by separating the training data-set in individual wells in a committee architecture framework. A tuning strategy is also considered for automatic estimation of prior probabilities in an optimization-scheme. Another novel alternative, named as CRC, adapted to the standard NB classifier consists in defining priors based on depth zones from regional stratigraphic information in which to apply NB classifier. We prepare an extensive statistical investigation, based on precision, recall, classification errors, fscores and confusion matrices to bespeak the most relevant NB strategy for classification of electrofacies. Despite the decent classification outcomes for all above-mentioned strategies, CRC can be considered, by a narrow margin, the most prolific method to be applied as an improvement of the standard NB to classify rock units. Tests are performed on a validation well (i.e., 7-MP-50D-BA) of Massapê Field, in Recôncavo Basin, northeast Brazil to highlight the classification particularities provided by the improved strategy. A significant improvement in the classification of sandstones (i.e., from 68 % to 83 % accuracy) is observed, according to the confusion matrix analysis. Additionally, a minimal decreasing is observed into the classification of shales and slurries (i.e., from 92% to 90% and to 63% to 56%, respectively), which is acceptable according to the fscores and errors. This aspect reinforces the relevance of using the NB classifier jointly with previous geologic information to optimize the lithologic classification.
Read full abstract