HEURISTIC DISCRETIZATION METHOD FOR BAYESIAN NETWORKS

Lima Lima

doi:10.3844/jcssp.2014.869.878

Abstract

Bayesian Network (BN) is a classification technique widely used in Artificial Intelligence. Its struct ure is a Direct Acyclic Graph (DAG) used to model the association of categorical variables. However, in cases w here the variables are numerical, a previous discretizat ion is necessary. Discretization methods are usuall y based on a statistical approach using the data distribution, such as division by quartiles. In this article we present a discretization using a heuristic that identifies ev ents called peak and valley. Genetic Algorithm was used to identify these events having the minimization of th e error between the estimated average for BN and th e actual value of the numeric variable output as the objecti ve function. The BN has been modeled from a database of Bit’s Rate of Penetration of the Brazilian pre-salt layer with 5 numerical variables and one categoric al variable, using the proposed discretization and the division of the data by the quartiles. The results show that the proposed heuristic discretization has higher accura cy than the quartiles discretization.

Highlights

A Bayesian Network (BN) allows modeling the probability distribution or using statistic parameters like knowledge of a domain through a set of usually categorical the frequency in each class
In this article we present a discretization using a heuristic that identifies events called peak and valley
We present a heuristic discretization for Bayesian Networks that seeks to find data patterns and divide the data set according to them

Summary

Introduction

A Bayesian Network (BN) allows modeling the probability distribution or using statistic parameters like knowledge of a domain through a set of usually categorical the frequency in each class. There is no guarantee that all variables of an application domain will be categorical, since there will be situations where numerical variables participate directly in the domain context. The discretization can be made by the experts on the field in a manual way It can be a complex task: There are cases where the data does not follow any visible pattern and when it does, this pattern may change in different occasions. To perform discretization for this domain, it is necessary to consider the conditional distributions of each variable of the process and how they previous discretization of the variables is recommended, influence the network as a whole

Methods

Results

Discussion

Conclusion