Abstract

Missing data value is an extensive problem in both research and industrial developers. Two general approaches are there to deal with the problem of missing values in databases; they could be either ignored (removed) or imputed (filled in) with new values (Farhangfar et al. in IEEE Trans Syst Man Cybern-Part A: Syst Hum 37(5):692–709, 2007). For some SQL tables, it is possible that some candidate key of the table is not null-free and this needs to be handled. Possible keys and certain keys to deal with this situation were introduced in Köhler et al. (VLDB J 25(4):571–596, 2016). In the present paper, we introduce an intermediate concept called strongly possible keys that is based on a data mining approach using only information already contained in the SQL table. A strongly possible key is a key that holds for some possible world which is obtained by replacing any occurrences of nulls with some values already appearing in the corresponding attributes. Implication among strongly possible keys is characterized, and Armstrong tables are constructed. An algorithm to verify a strongly possible key is given applying bipartite matching. Connection between matroid intersection problem and system of strongly possible keys is established. For the cases when no strongly possible keys hold, an approximation notion is introduced to calculate the closeness of any given set of attributes to be considered as a strongly possible key using the g_3 measure, and we derive its component version g_4. Analytical comparisons are given between the two measures.

Highlights

  • – We introduced and defined strongly possible keys over database tables that contain some occurrences of nulls

  • We show that deciding whether a given set of attributes is a strongly possible key can be done by application of matchings in bipartite graph, so Hall’s condition is naturally applied

  • – We showed that deciding whether a given system of sets of attributes is a system of strongly possible keys for a given table can be done using matroid intersection

Read more

Summary

Introduction

In the case of data warehousing if different sources of raw data are merged, some attributes may exist in some of the sources while not available in some of the others This makes it necessary to treat keys over incomplete tables. We define a strongly possible key as a key that is satisfied by some possible world that is obtained by replacing each occurrence of a null by a value from the corresponding attribute existing values. There are incomplete SQL tables that do not have certain keys; for example, see Fig. 3b In such cases, strongly possible keys make the least possible assumption about attribute domains, as they only take values that are already present, in contrast to possible keys that could take any value from a (possibly infinite) predefined domain. Possible keys over relational data with null occurrences in the key attributes are studied in Sect.

Related Work
Preliminaries
Strongly Possible Keys
Implication Problem
Checking a Single Strongly Possible Key
System of Multiple Strongly Possible Keys
Necessary Conditions
A2 t1 0 0 t2 0 1 t3 1 0 t4 1 1 t5 2 0 t6 2 1 t7 3 0 t8 4 1
Application to Real-Life Datasets
Strongly Possible Keys Approximation
Analytical Comparison
Findings
Conclusion and Future Directions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call