Abstract

Naive Bayes classifier (NBC) is an effective classification technique in data mining and machine learning, which is based on the attribute conditional independence assumption. However, this assumption rarely holds true in real-world applications, so numerous researches have been made to alleviate the assumption by attribute weighting. To the best of our knowledge, almost all studies have calculated attribute weights according to correlation measure or classification accuracy. In this paper, we propose a novel causality-based attribute weighting method to establish the weighted NBC called IFG-WNBC, where causal information flow (IF) theory and genetic algorithm (GA) are adopted to search for optimal weights. The introduction of IF produces a bran-new weight measure criterion from the angle of causality other than correlation. The population initialization in GA is also improved with IF-based weights for efficient optimization. Multi-set of comparison experiments on UCI data sets demonstrate that IFG-WNBC achieves superiority over classic NBC and other common weighted NBC algorithms in classification accuracy and running time.

Highlights

  • The 21st century is characterized by information explosion

  • Attribute weights calculated from information theory and algebraic theory are set as the initial population, and the classification accuracy are defined as the fitness function

  • The proposed IFG-WNBC is achieved through the following three major steps: (i) use causal information flow (IF) to compute the initial weights and generate the fine initial population; (ii) based on the initial population, use genetic algorithm (GA) to search the optimal weight combination of attributes according to classification error; and (iii) test instances are classified by IFG-WNBC with the learned attribute weights

Read more

Summary

INTRODUCTION

The 21st century is characterized by information explosion. With the popularization and application of internet technology, data acquisition is more and more convenient. Hall [8] proposed an attribute weighting method for NBC with correlation-based feature selection (CFS), where the attribute weights were inversely proportional to the order of attributes in CFS. Bao [17] proposed a genetic algorithm (GA)-based weighted NBC In this algorithm, attribute weights calculated from information theory and algebraic theory are set as the initial population, and the classification accuracy are defined as the fitness function. The existing correlation-based attribute weight cannot express the causal relationship, which is precisely a very important feature in NBC. We combine the ideas of filter method and wrapper method to propose a new attribute weighting approach for the weighted NBC, namely IFG-WNBC, where causal IF is taken to measure the importance of attributes from the perspective of causality and improved GA is used to search for the optimal weights.

THEORETICAL DERIVATION
FITNESS FUNCTION
Findings
CONCLUSION AND FUTURE STUDY
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.