Mining conditional functional dependency rules on big data

Mingda Li,Hongzhi Wang,Jianzhong Li

doi:10.26599/bdma.2019.9020019

Mingda Li, Hongzhi Wang + Show 1 more

Open Access

https://doi.org/10.26599/bdma.2019.9020019

Copy DOI

Abstract

Current Conditional Functional Dependency (CFD) discovery algorithms always need a well-prepared training dataset. This condition makes them difficult to apply on large and low-quality datasets. To handle the volume issue of big data, we develop the sampling algorithms to obtain a small representative training set. We design the fault-tolerant rule discovery and conflict-resolution algorithms to address the low-quality issue of big data. We also propose parameter selection strategy to ensure the effectiveness of CFD discovery algorithms. Experimental results demonstrate that our method can discover effective CFD rules on billion-tuple data within a reasonable period.

Highlights

With the accumulation of data at present, databases have become increasingly large
(3) The time of cleaning data with discovered Conditional Functional Dependency (CFD). (4) The quality of data cleaned by discovered CFDs is measured by the percentage of data cleaned according to the CFD sets discovered by our approach on dirty data and those obtained from the clean data
In Refs. [11, 14], for centralized storing relational databases, the approaches are designed to detect the tuples in violation of CFDs and Conditional Inclusion Dependencies (CINDs) automatically based on Structured Query Language (SQL) query processing

Summary

Introduction

With the accumulation of data at present, databases have become increasingly large. At the same time, due to the difficulty in manual maintenance and variations of data sources, big data involves a high possibility of quality problems which make them difficult to use. [2] discover high-quality rules with data mining algorithms on a small but clean dataset efficiently. Developing a scalable method is necessary to mine high-quality rules from big data with size larger than the main memory To achieve this goal, we design a scalable and systemic algorithm. Mingda Li et al.: Mining Conditional Functional Dependency Rules on Big Data dataset is larger than the memory Another purpose of sampling is to filter dirty items and keep clean ones. The developed rule discovery method that is suitable for big data with size larger than the memory requires the following features, which the existing methods do not have:. We propose a method for discovering a high-quality CFD set Such an approach could tolerate data-quality problems and meet user requirements for a dataset with size larger than the memory.

Background

Problem definition

Framework

Multiple-pass scan algorithm

Tuple section criteria

One-pass sampling algorithm

DFCFD algorithm

Dealing with conflicts between CFDs

Calculating the weight of each node

Discovery of the conflict between two CFDs

Parameter Selection

Experimental settings

Performance and scalability experiments

Optimality of parameters

Test on real data

Related Work

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Big Data Mining and Analytics	Publication Date: Mar 1, 2020
Citations: 38	License type: cc-by

R Discovery Prime

R Discovery Prime

Mining conditional functional dependency rules on big data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics

Lead the way for us

Similar Papers

Effective Pruning for the Discovery of Conditional Functional Dependencies
J Li ... J Liu
The Computer Journal | VOL. 56
J Li, et. al.J Li ... J Liu
24 Jun 2012
The Computer Journal | VOL. 56

Discovering Conditional Functional Dependencies
Wenfei Fan ... Jianzhong Li
IEEE Transactions on Knowledge and Data Engineering | VOL. 23
Wenfei Fan, et. al.Wenfei Fan ... Jianzhong Li
01 May 2011
IEEE Transactions on Knowledge and Data Engineering | VOL. 23

Guided conditional functional dependency discovery
Sijia Jiang ... Shuai Ma
Information Systems | VOL. 114
Sijia Jiang, et. al.Sijia Jiang ... Shuai Ma
17 Dec 2022
Information Systems | VOL. 114

Discovering Conditional Functional Dependencies
Wenfei Fan ... Laks V S Lakshmanan
-
Wenfei Fan, et. al.Wenfei Fan ... Laks V S Lakshmanan
01 Mar 2009
01 Mar 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mining conditional functional dependency rules on big data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics