Application of an efficient Bayesian discretization method to biomedical data

Jonathan L Lustgarten,Vanathi Gopalakrishnan,Shyam Visweswaran,Gregory F Cooper

doi:10.1186/1471-2105-12-309

Jonathan L Lustgarten, Vanathi Gopalakrishnan + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-12-309

Copy DOI

Abstract

BackgroundSeveral data mining methods require data that are discrete, and other methods often perform better with discrete data. We introduce an efficient Bayesian discretization (EBD) method for optimal discretization of variables that runs efficiently on high-dimensional biomedical datasets. The EBD method consists of two components, namely, a Bayesian score to evaluate discretizations and a dynamic programming search procedure to efficiently search the space of possible discretizations. We compared the performance of EBD to Fayyad and Irani's (FI) discretization method, which is commonly used for discretization.ResultsOn 24 biomedical datasets obtained from high-throughput transcriptomic and proteomic studies, the classification performances of the C4.5 classifier and the naïve Bayes classifier were statistically significantly better when the predictor variables were discretized using EBD over FI. EBD was statistically significantly more stable to the variability of the datasets than FI. However, EBD was less robust, though not statistically significantly so, than FI and produced slightly more complex discretizations than FI.ConclusionsOn a range of biomedical datasets, a Bayesian discretization method (EBD) yielded better classification performance and stability but was less robust than the widely used FI discretization method. The EBD discretization method is easy to implement, permits the incorporation of prior knowledge and belief, and is sufficiently fast for application to high-dimensional data.

Highlights

Several data mining methods require data that are discrete, and other methods often perform better with discrete data
An Efficient Bayesian Discretization Method We introduce a new supervised univariate discretization method called efficient Bayesian discretization (EBD)
EBD consists of i) a Bayesian score to evaluate discretizations, and ii) a dynamic programming search method to locate the optimal discretization in the space of possible discretizations

Summary

Introduction

Several data mining methods require data that are discrete, and other methods often perform better with discrete data. With the advent of high-throughput techniques, such as DNA microarrays and mass spectrometry, transcriptomic and proteomic studies are generating an abundance of high-dimensional biomedical data. The analysis of such data presents significant analytical and computational challenges, and increasingly data mining techniques are being applied to these data with promising results [1,2,3,4]. A typical task in such analysis, for example, entails the learning of a mathematical model from gene expression or protein expression data that predicts well a phenotype, such as disease or health In data mining, such a task is called classification and the model that is learned is termed a classifier. A variety of discretization methods have been developed for converting continuous data to discrete data [5,6,7,8,9,10,11], and one that is commonly used is Fayyad and Irani’s (FI) discretization method [9]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jul 28, 2011
Citations: 81	License type: cc-by

R Discovery Prime

R Discovery Prime

Application of an efficient Bayesian discretization method to biomedical data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Challenges in Developing Prediction Models for Multi-modal High-Throughput Biomedical Data
Abeer Alzubaidi
-
Abeer AlzubaidiAbeer Alzubaidi
09 Nov 2018
09 Nov 2018

Unsupervised discretization method based on adjustable intervals
...
-
, et. al. ...
01 Jan 2012
01 Jan 2012

Evaluating feature selection strategies for high dimensional, small sample size datasets
Abhishek Golugula ... Anant Madabhushi
-
Abhishek Golugula, et. al.Abhishek Golugula ... Anant Madabhushi
01 Aug 2011
01 Aug 2011

Improved Equilibrium Optimization Algorithm Using Elite Opposition-Based Learning and New Local Search Strategy for Feature Selection in Medical Datasets
Zenab Mohamed Elgamal ... Hazim Jarrah
Computation | VOL. 9
Zenab Mohamed Elgamal, et. al.Zenab Mohamed Elgamal ... Hazim Jarrah
10 Jun 2021
Computation | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Application of an efficient Bayesian discretization method to biomedical data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics