Abstract

Data currency is a temporal reference of data, it reflects the degree to which the data is current with the worldit models. Currency rule is a formal rule extracted from the data set and reflecting the currency order of thedata tuples, it can be used for both data repairing and currency quality evaluation. Based on the research of datacurrency repairing, the basic form of currency rule is extended, and parallel rule extraction and update algorithmsare proposed to meet the requirement of running on dynamic data sets. Besides, four data currency qualityevaluation models are proposed and verified by experiments. The performance test show that the efficiencyof parallel algorithms is significantly improved, the rules compliance mean(CM2) model based on extendedcurrency rule has the highest average precision. The extended currency rules not only improve the efficiencyand adaptability, but also provide more valuable features for data quality evaluation.

Highlights

  • Currency is an important feature of data, it is a temporal reference that reflects the degree to which the data is current with the world it models

  • There are a lot of time-disrupted data in our data sets, if we can‘t identify which one is „latest“, data queries may return incorrect results, and data analysis may lead to ambiguous conclusions, followed by data quality degradation and data value reduction

  • This paper extended the basic form of currency rule, the extended rules can be updated incrementally on dynamic data sets, and the new added path length attribute can provide more effective information for currency quality evaluation

Read more

Summary

Introduction

Currency is an important feature of data, it is a temporal reference that reflects the degree to which the data is current with the world it models. The time attributes of these multi-source heterogeneous data are often inaccurate, which brings great challenges to data quality and data value [29]. If the timestamp is incomplete or inaccurate, the order of the records cannot be determined which will brings great difficulties in data analysis and value-added application. Eve‘s Database semester is unknown, but as Eve has chosen a „Data Structure“ in his second semester, we can infer that the semester of Eve‘s „Database“ will not be earlier than that of „Data Structure“ according to Alice and Bob‘s rule „Data Structure→Database“ It is not certain whether the semester is 3 or 4, we know that it has a high probability of being greater than 2, so we can determine the order of the two records of Eve

C Programming t5
Literature Review
Introduction tIontroduction to
Currency Rule Extraction Algorithm
Rule Sets Merging Algorithm
Evaluation
Parallel Test for Rules Extraction Algorithm
Non-Parallel Test for Rule Sets Merging Algorithm
Non-Parallel Test for Currency Evaluation
60050 .5 Evaluation and Analysis of data
Evaluation and Analysis of data aded 260
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call