Abstract

Bayesian Network (BN) is a classification technique widely used in Artificial Intelligence. Its struct ure is a Direct Acyclic Graph (DAG) used to model the association of categorical variables. However, in cases w here the variables are numerical, a previous discretizat ion is necessary. Discretization methods are usuall y based on a statistical approach using the data distribution, such as division by quartiles. In this article we present a discretization using a heuristic that identifies ev ents called peak and valley. Genetic Algorithm was used to identify these events having the minimization of th e error between the estimated average for BN and th e actual value of the numeric variable output as the objecti ve function. The BN has been modeled from a database of Bit’s Rate of Penetration of the Brazilian pre-salt layer with 5 numerical variables and one categoric al variable, using the proposed discretization and the division of the data by the quartiles. The results show that the proposed heuristic discretization has higher accura cy than the quartiles discretization.

Highlights

  • A Bayesian Network (BN) allows modeling the probability distribution or using statistic parameters like knowledge of a domain through a set of usually categorical the frequency in each class

  • In this article we present a discretization using a heuristic that identifies events called peak and valley

  • We present a heuristic discretization for Bayesian Networks that seeks to find data patterns and divide the data set according to them

Read more

Summary

Introduction

A Bayesian Network (BN) allows modeling the probability distribution or using statistic parameters like knowledge of a domain through a set of usually categorical the frequency in each class. There is no guarantee that all variables of an application domain will be categorical, since there will be situations where numerical variables participate directly in the domain context. The discretization can be made by the experts on the field in a manual way It can be a complex task: There are cases where the data does not follow any visible pattern and when it does, this pattern may change in different occasions. To perform discretization for this domain, it is necessary to consider the conditional distributions of each variable of the process and how they previous discretization of the variables is recommended, influence the network as a whole

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call