Abstract

Data quality on categorical attribute is a difficult problem that has not receivedas much attention as numerical counterpart. Our basic idea is to employ association rule for the purpose of data quality measurement. Strong rule generation is an important area of data mining.Association rule mining problems can be considered as a multiobjective problem rather than as a single objective one.The main area of concentration was the rules generated by association rule mining using gene tic algorithm. The advantage of using geneticalgorithm is to discover high level predictionrules is that they perform a global search and cope better with attribute interaction than the greedy rule induction algorithm often used in data mining. Genetic algorithm based approach utilizes the linkage betweenassociation rule and feature selection. In this paper, we put forward a Multi objective genetic algorithm approach for data quality on categorical attributes. The result shows that our approach is outperformed by the objecti ves like accuracy, completeness, comprehensibilityand interestingness.

Highlights

  • Data Mining is the most instrumental tool in discovering knowledge from transactions [1, 2].The most important application of data mining is discovering association rules

  • Data Quality Mining can be defined as the deliberate applications of Data Mining techniques for the purpose of Data quality, measurement and improvement [3]

  • We describe a first approach to employ Association Rules for the purpose of data quality mining

Read more

Summary

INTRODUCTION

Data Mining is the most instrumental tool in discovering knowledge from transactions [1, 2].The most important application of data mining is discovering association rules. Data Quality Mining is one of the most important tasks in data mining community and is an active research area because the data being generated and stored in databases of organizations are already enormous and continues to grow very fast. This large amount of stored data normally contains valuable hidden knowledge, which if harnessed could be used to improve the decision. Data Quality Mining using Genetic Algorithm is applied for categorical attributes It is a widely used technique in which data points are partitioned into groups in such a way that points in the same group. For example: if age is discretized into steps of 2 years we would probably find rules Age(X, 18...19) and lives (X, Lausanne) → profession(X, student) Age(X, 20...21) and lives (X, Lausanne) → profession(X, student) Could be expressed as a rule Age(X, 18...21) and lives (X, Lausanne) → profession(X, student) This is more compact but requires a different discretization

ASSOCIATION RULES
GENETIC ALGORITHM
Genetic Operators for Rule Discovery
Selection
Michigan versus Pittsburgh Approach
EVALUATING THE QUALITY OF A RULE
THE PROPOSED ALGORITHM
7.RESULTS AND DISCUSSION
8.CONCLUSION & FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.