Numeric mapping and learnability of naive bayes

Harry Zhang,Charles X Ling

doi:10.1080/713827178

Abstract

In data-mining applications, it is common to transform (or map) nominal attributes into numeric ones in order to apply a specific model. However, a nominal attribute has typically no specific order in its values and no geometric meaning. An interesting issue is, does such a transformation change the property of a nominal function? How do you measure the geometric complexity of a nominal function independent of the mapping? This paper discusses the issue of converting a nominal function into a numeric one. We propose a three-layer measure for the geometric linearity of a nominal function and explore the geometric property of a nominal function independent of the mapping. Naive Bayes is one of the most efficient and effective inductive-learning algorithms for data mining. It is well known that Naive Bayes is linear in the binary domain; that is, it can learn only linearly separable functions. We show that Naive Bayes is actually nonlinear in the nominal domain, a general case of the binary domain, by exploring the geometric property of Naive Bayes. We investigate the geometric property of Naive Bayes based on the three-layer linearity measure that we propose. Our work helps researchers to understand the influence of numeric mapping on the property of a nominal function, and how numeric mapping affects the learnability of Naive Bayes.

Full Text