Information Theory Based Feature Selection for Multi-Relational Naïve Bayesian Classifier

Kalpesh H Vandra

doi:10.4172/2153-0602.1000155

Abstract

Today data’s are stored in relation structures. In usual approach to mine these data, we often use to join several relations to form a single relation using foreign key links, which is known as flatten. Flatten may cause troubles such as time consuming, data redundancy and statistical skew on data. Hence, the critical issues arise that how to mine data directly on numerous relations. The solution of the given issue is the approach called multi-relational data mining (MRDM). Other issues are irrelevant or redundant attributes in a relation may not make contribution to classification accuracy. Thus, feature selection is an essential data pre-processing step in multi-relational data mining. By filtering out irrelevant or redundant features from relations for data mining, we improve classification accuracy, achieve good time performance, and improve comprehensibility of the models. We had proposed the entropy based feature selection method for Multi-relational Naive Bayesian Classifier. We have use method InfoDist and Pearson’s Correlation parameters, which will be used to filter out irrelevant and redundant features from the multi-relational database and will enhance classification accuracy. We analyzed our algorithm over PKDD financial dataset and achieved the better accuracy compare to the existing features selection methods.

Highlights

The term data mining refers to the extraction of valuable knowledge from large amounts of data
Multi-Relational data mining looks for patterns that involve multiple relations in a relational database, its main difference with traditional data mining approaches is that it does not need to transform the data into a single table, it learns from the data in its original form preserving its structure and incorporating such structure into the learning process [2,3]
We had used the accuracy as our comparision parameter and we achieve the better accuracy compared to the existing methods

Summary

Introduction

The term data mining refers to the extraction of valuable knowledge from large amounts of data. Many classification approaches can only be applied to a single relation When performing these approaches on multi-relational data, it often requires transferring data into a single table by flattening and feature construction, which is known as Propositionalization. Many of these methods are heuristic, so flatten may cause some problems such as time consuming and statistical skew on data. Multi-Relational data mining looks for patterns that involve multiple relations in a relational database, its main difference with traditional data mining approaches is that it does not need to transform the data into a single table, it learns from the data in its original form preserving its structure and incorporating such structure into the learning process [2,3]

Relational databases

Transaction Disposition

Sementic relationship graph

Tuple ID propagation

Feature selection process

Our proposed entropy based feature selection algorithm

Author Name and year of publication

Classifier FOIL TILDE

Experiments, Results and Discussion

Conclusion and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Data Mining in Genomics & Proteomics	Publication Date: Jan 1, 2014
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Information Theory Based Feature Selection for Multi-Relational Naïve Bayesian Classifier

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Mining in Genomics & Proteomics

Lead the way for us

Similar Papers

MR-MNBC: MaxRel based feature selection for the multi-relational Naïve Bayesian Classifier
Vimalkumar B Vaghela ... Nilesh K Modi
-
Vimalkumar B Vaghela, et. al.Vimalkumar B Vaghela ... Nilesh K Modi
01 Nov 2013
01 Nov 2013

A novel hybrid feature selection strategy in quantitative analysis of laser-induced breakdown spectroscopy
Chunhua Yan ... Hua Li
Analytica Chimica Acta | VOL. 1080
Chunhua Yan, et. al.Chunhua Yan ... Hua Li
09 Jul 2019
Analytica Chimica Acta | VOL. 1080

Determination of biomarkers from microarray data using graph neural network and spectral clustering
Kun Yu ... Linjie Wang
Scientific Reports | VOL. 11
Kun Yu, et. al.Kun Yu ... Linjie Wang
01 Dec 2021
Scientific Reports | VOL. 11

A binary Krill Herd approach based feature selection for high dimensional data
V Preeja ... A H Shahana
-
V Preeja, et. al.V Preeja ... A H Shahana
01 Aug 2016
01 Aug 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Information Theory Based Feature Selection for Multi-Relational Naïve Bayesian Classifier

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Mining in Genomics &amp; Proteomics

More From: Journal of Data Mining in Genomics & Proteomics