NML Computation Algorithms for Tree-Structured Multinomial Bayesian Networks

Petri Kontkanen,Hannes Wettig,Petri Myllymäki

doi:10.1155/2007/90947

Abstract

Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically well-founded, general framework for performing statistical inference. The mathematical formalization of MDL is based on the normalized maximum likelihood (NML) distribution, which has several desirable theoretical properties. In the case of discrete data, straightforward computation of the NML distribution requires exponential time with respect to the sample size, since the definition involves a sum over all the possible data samples of a fixed size. In this paper, we first review some existing algorithms for efficient NML computation in the case of multinomial and naive Bayes model families. Then we proceed by extending these algorithms to more complex, tree-structured Bayesian networks.

Highlights

Many problems in bioinformatics can be cast as model class selection tasks, that is, as tasks of selecting among a set of competing mathematical explanations the one that best describes a given sample of data
The minimum description length (MDL) principle developed in the series of papers [6,7,8] is a well-founded, general framework for performing model class selection and other types of statistical inference
The model families used in our work are Bayesian networks of varying complexity

Summary

INTRODUCTION

Many problems in bioinformatics can be cast as model class selection tasks, that is, as tasks of selecting among a set of competing mathematical explanations the one that best describes a given sample of data. For multinomial (discrete) data, this definition involves a normalizing sum over all the possible data samples of a fixed size The logarithm of this sum is called the regret or parametric complexity, and it can be interpreted as the amount of complexity of the model class. The NML distribution has several theoretical optimality properties, which make it a very attractive candidate for performing model class selection and related tasks. A more complex case involving a multidimensional model family, called naive Bayes, was discussed in [16]. Both these cases are reviewed in this paper.

PROPERTIES OF THE MDL PRINCIPLE AND THE NML MODEL

Model classes and families

The NML distribution

NML FOR MULTINOMIAL MODELS

The model family

The quadratic-time algorithm

The linear-time algorithm

Approximating the multinomial NML

NML FOR THE NAIVE BAYES MODEL

NML FOR BAYESIAN FORESTS

The algorithm

Leaves

Inner nodes

Component tree roots

1: Count all frequencies fikl and fil from the data xn

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP journal on bioinformatics & systems biology	Publication Date: Jan 1, 2007
Citations: 6	License type: cc-by

R Discovery Prime

R Discovery Prime

NML Computation Algorithms for Tree-Structured Multinomial Bayesian Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP journal on bioinformatics & systems biology

Lead the way for us

Similar Papers

A linear-time algorithm for computing the multinomial stochastic complexity
Petri Kontkanen ... Petri Myllymäki
Information Processing Letters | VOL. 103
Petri Kontkanen, et. al.Petri Kontkanen ... Petri Myllymäki
20 Apr 2007
Information Processing Letters | VOL. 103

Improved spatially adaptive MDL denoising of images using normalized maximum likelihood density
Srinivasan Meena ... S Annadurai
Image and Vision Computing | VOL. 26
Srinivasan Meena, et. al.Srinivasan Meena ... S Annadurai
01 May 2008
Image and Vision Computing | VOL. 26

Bayesian network structure learning using factorized NML universal models
Teemu Roos ... Tomi Silander
-
Teemu Roos, et. al.Teemu Roos ... Tomi Silander
01 Jan 2008
01 Jan 2008

Bayes Factors, relations to Minimum Description Length, and overlapping model classes
Richard M Shiffrin ... Peter D Grünwald
Journal of Mathematical Psychology | VOL. 72
Richard M Shiffrin, et. al.Richard M Shiffrin ... Peter D Grünwald
15 Dec 2015
Journal of Mathematical Psychology | VOL. 72

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NML Computation Algorithms for Tree-Structured Multinomial Bayesian Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP journal on bioinformatics & systems biology