The Bounded Cardinality Normal Form for the Logical Design of Relational Database Schemata

  • Abstract
  • References
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

The goal of classical normalization is to maintain data consistency under updates, with a minimum level of effort. Given functional dependencies (FDs) alone, this goal is only achievable in the special case an FD-preserving Boyce–Codd Normal Form (BCNF) decomposition exists. As we show, in all other cases the level of effort can be neither controlled nor quantified. In response, we establish the ℓ-Bounded Cardinality Normal Form, parameterized by a positive integer ℓ. For every ℓ, the normal form condition requires from every instance that every value combination over the left-hand side of every non-trivial FD does not occur in more than ℓ tuples. BCNF is captured when ℓ =1. We show that schemata in ℓ-Bounded Cardinality Normal Form characterize instances in which updates to at most ℓ occurrences of any redundant data value are sufficient to maintain data consistency. In fact, for the smallest ℓ in which a schema is in ℓ-Bounded Cardinality Normal Form we capture an equilibrium between worst-case update inefficiency and best-case join efficiency, where some redundant data value can be joined with up to ℓ other data values. We then establish algorithms that compute schemata in ℓ-Bounded Cardinality Normal Form for the smallest level ℓ attainable across all lossless, FD-preserving decompositions. Additional algorithms (i) attain even smaller levels of effort based on the loss of some FDs, and (ii) decompose schemata based on prioritized FDs that cause high levels of effort. Our framework informs de-normalization already during logical design. In particular, every materialized view exhibits an equilibrium level ℓ that quantifies its worst-case incremental maintenance cost and its best-case support for join queries. Experiments with synthetic and real-world data illustrate which properties the schemata have that result from our algorithms, and how these properties predict the performance of update and query operations on instances over the schemata, without and with materialized views. We further demonstrate how our framework can automate the design of data warehouses by mining data for dimensions that exhibit high levels of data redundancy. In an effort to align data and the business rules that govern them, we use lattice theory to characterize ℓ-Bounded Cardinality Normal Form on database instances and schemata. As a consequence, any difference in constraints observed at instance and schema levels provides an opportunity to improve data quality, insight derived from analytics, and database performance.

ReferencesShowing 10 of 49 papers
  • Cite Count Icon 5
  • 10.1109/tkde.2019.2911580
Equilibrium of Redundancy in Relational Model for Optimized Data Retrieval
  • Jan 1, 2020
  • IEEE Transactions on Knowledge and Data Engineering
  • Nemanja Kojic + 1 more

  • Cite Count Icon 58
  • 10.1002/j.1538-7305.1983.tb03470.x
An Algebraic Theory of Relational Databases
  • Dec 1, 1983
  • Bell System Technical Journal
  • T T Lee

  • Cite Count Icon 9
  • 10.1007/s00778-018-0511-z
Probabilistic Cardinality Constraints
  • Jul 2, 2018
  • The VLDB Journal
  • Tania Roblot + 2 more

  • Cite Count Icon 53
  • 10.1007/s002360050157
Semantic foundations of 4NF in relational database design
  • Mar 1, 1999
  • Acta Informatica
  • Millist W Vincent

  • Cite Count Icon 91
  • 10.14778/3192965.3192968
Efficient discovery of approximate dependencies
  • Mar 1, 2018
  • Proceedings of the VLDB Endowment
  • Sebastian Kruse + 1 more

  • Cite Count Icon 83
  • 10.1561/1900000020
Materialized Views
  • Jan 1, 2011
  • Foundations and Trends® in Databases
  • Rada Shirkova

  • Cite Count Icon 30
  • 10.1016/0020-0190(79)90082-6
Testing for existence of a covering Boyce-Codd normal form
  • Jan 1, 1979
  • Information Processing Letters
  • Sylvia L Osborn

  • Open Access Icon
  • Cite Count Icon 128
  • 10.1016/s0306-4379(02)00021-2
Why is the snowflake schema a good data warehouse design?
  • Jan 17, 2003
  • Information Systems
  • Mark Levene + 1 more

  • Cite Count Icon 49
  • 10.1109/icde.1996.492110
Towards the reverse engineering of renormalized relational databases
  • Mar 25, 2019
  • J.-M Petit + 3 more

  • Cite Count Icon 7
  • 10.1145/1400214.1400240
Conceptually modeling windows and bounds for space and time in database constraints
  • Nov 1, 2008
  • Communications of the ACM
  • Faiz Currim + 1 more

Similar Papers
  • Research Article
  • 10.1145/3725362
Synthesizing Third Normal Form Schemata that Minimize Integrity Maintenance and Update Overheads: Parameterizing 3NF by the Numbers of Minimal Keys and Functional Dependencies
  • Jun 17, 2025
  • Proceedings of the ACM on Management of Data
  • Zhuoxing Zhang + 1 more

State-of-the-art relational schema design generates a lossless, dependency-preserving decomposition into Third Normal Form (3NF), that is in Boyce-Codd Normal Form (BCNF) whenever possible. In particular, dependency-preservation ensures that data integrity can be maintained on individual relation schemata without having to join them, but may need to tolerate a priori unbounded levels of data redundancy and integrity faults. As our main contribution we parameterize 3NF schemata by the numbers of minimal keys and functional dependencies they exhibit. Conceptually, these parameters quantify, already at schema design time, the effort necessary to maintain data integrity, and allow us to break ties between 3NF schemata. Computationally, the parameters enable us to optimize normalization into 3NF according to different strategies. Operationally, we show through experiments that our optimizations translate from the logical level into significantly smaller update overheads during integrity maintenance. Hence, our framework provides access to parameters that guide the computation of logical schema designs which reduce operational overheads.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/cmpsac.1979.762527
Decomposition of a relation into fourth normal forms
  • Jan 1, 1979
  • R.Y Fadous

In the relational model, various relationships may exist between the attributes in a relation. The relationships are described by functional and multivalued dependencies. The functional dependencies are the basic elements for the decomposition of a relation into a family of relations in second, third, and Boyce-Codd normal forms. The multivalued dependencies, which include the functional dependencies as a special case, extend the decomposition further into a collection of relations in fourth normal form. This paper examines the conditions under which a relation in Boyce-Codd normal form should be decomposed further into fourth normal form and a fourth normal form normalization procedure is presented.

  • Research Article
  • Cite Count Icon 53
  • 10.1007/s002360050157
Semantic foundations of 4NF in relational database design
  • Mar 1, 1999
  • Acta Informatica
  • Millist W Vincent

The issue of providing a formal justification for the use of fourth normal form (4NF) in relational database design is investigated. The motivation and formal definitions for three goals of database design are presented. These goals are the elimination of: redundancy, key-based update anomalies and fact-based replacement anomalies. It is then shown that, depending on the type of constraints permitted, either Boyce-Codd normal form (BCNF) or 4NF are the exact conditions needed to ensure most of the design goals. However, it is also shown that the conditions required to ensure the absence of a particular class of key-based update anomaly are new normal forms which have not previously been identified. In particular, for the case where the only constraints are functional dependencies (FDs), it is shown that the required normal form is a new normal form that is stronger than third normal form (3NF) yet weaker than BCNF. Similarly, in the more general case where both FD and multivalued dependencies (MVDs) are present, the required normal form is a new normal form that is weaker than 4NF.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1016/b978-0-12-382020-4.00010-0
6 - Normalization
  • Jan 1, 2011
  • Database Modeling and Design
  • Toby J Teorey + 3 more

6 - Normalization

  • Research Article
  • Cite Count Icon 32
  • 10.1145/310701.310712
Database design for incomplete relations
  • Mar 1, 1999
  • ACM Transactions on Database Systems
  • Mark Levene + 1 more

Although there has been a vast amount of research in the area of relational database design, to our knowledge, there has been very little work that considers whether this theory is still valid when relations in the database may be incomplete. When relations are incomplete and thus contain null values the problem of whether satisfaction is additive arises. Additivity is the property of the equivalence of the satisfaction of a set of functional dependencies (FDs) F with the individual satisfaction of each member of F in an incomplete relation. It is well known that in general, satisfaction of FDs is not additive. Previously we have shown that satisfaction is additive if and only if the set of FDs is monodependent. We conclude that monodependence is a fundamental desirable property of a set of FDs when considering incomplete information in relational database design. We show that, when the set of FDs F either satifies the intersection property or the split-freeness property, then the problem of finding an optimum cover of F can be solved in polynomial time in the size of F; in general, this problem is known to be NP-complete. We also show that when F satisfies the split-freeness property then deciding whether there is a superkey of cardinality k or less can be solved in polynomial time in the size of F, since all the keys have the same cardinality. If F only satisfies the intersection property then this problem is NP-complete, as in the general case. Moreover, we show that when F either satisfies the intersection property or the split-freeness property then deciding whether an attribute is prime can be solved in polynomial time in the size of F; in general, this problem is known to be NP-complete. Assume that a relation schema R is an appropriate normal form with respect to a set of FDs F. We show that when F satisfies the intersection property then the notions of second normal form and third normal form are equivalent. We also show that when R is in Boyce-Codd Normal Form (BCNF), then F is monodependent if and only if either there is a unique key for R, or for all keys X for R, the cardinality of X is one less than the number of attributes associated with R. Finally, we tackle a long-standing problem in relational database theory by showing that when a set of FDs F over R satisfies the intersection property, it also satisfies the split-freeness property (i.e., is monodependent), if and only if every lossless join decomposition of R with respect to F is also dependecy preserving. As a corollary of this result we are able to show that when F satisfies the intersection property, it also satisfies the intersection property, it also satisfies the split-freeness property(i.e., is monodependent), if and only if every lossless join decomposition of R, which is in BCNF, is also dependency preserving. Our final result is that when F is monodependent, then there exists a unique optimum lossless join decomposition of R, which is in BCNF, and is also dependency preserving. Furthermore, this ultimate decomposition can be attained in polynomial time in the size of F.

  • Research Article
  • Cite Count Icon 38
  • 10.1016/j.is.2018.04.001
SQL schema design: foundations, normal forms, and normalization
  • Apr 13, 2018
  • Information Systems
  • Henning Köhler + 1 more

SQL schema design: foundations, normal forms, and normalization

  • Research Article
  • Cite Count Icon 22
  • 10.1016/0020-0190(89)90065-3
Boyce–Codd normal form and object normal forms
  • Jul 1, 1989
  • Information Processing Letters
  • Joachim Biskup

Boyce–Codd normal form and object normal forms

  • Research Article
  • Cite Count Icon 55
  • 10.1145/319732.319749
A new normal form for the design of relational database schemata
  • Sep 1, 1982
  • ACM Transactions on Database Systems
  • Carlo Zaniolo

This paper addresses the problem of database schema design in the framework of the relational data model and functional dependencies. It suggests that both Third Normal Form (3NF) and Boyce-Codd Normal Form (BCNF) supply an inadequate basis for relational schema design. The main problem with 3NF is that it is too forgiving and does not enforce the separation principle as strictly as it should. On the other hand, BCNF is incompatible with the principle of representation and prone to computational complexity. Thus a new normal form, which lies between these two and captures the salient qualities of both is proposed. The new normal form is stricter than 3NF, but it is still compatible with the representation principle. First a simpler definition of 3NF is derived, and the analogy of this new definition to the definition of BCNF is noted. This analogy is used to derive the new normal form. Finally, it is proved that Bernstein's algorithm for schema design synthesizes schemata that are already in the new normal form.

  • Book Chapter
  • 10.1016/b978-012369379-2/50002-9
Chapter 2 - Normalization
  • Jan 1, 2005
  • Joe Celko's SQL for Smarties
  • Joe Celko

Chapter 2 - Normalization

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.dam.2007.02.014
Non-deterministic ideal operators: An adequate tool for formalization in Data Bases
  • Jun 16, 2007
  • Discrete Applied Mathematics
  • P Cordero + 3 more

Non-deterministic ideal operators: An adequate tool for formalization in Data Bases

  • Conference Article
  • Cite Count Icon 22
  • 10.1145/2882903.2915239
SQL Schema Design
  • Jun 14, 2016
  • Henning Köhler + 1 more

Normalization helps us find a database schema at design time that can process the most frequent updates efficiently at run time. Unfortunately, relational normalization only works for idealized database instances in which duplicates and null markers are not present. On one hand, these features occur frequently in real-world data compliant with the industry standard SQL, and especially in modern application domains. On the other hand, the features impose challenges that have made it impossible so far to extend the existing forty year old normalization framework to SQL. We introduce a new class of functional dependencies and show that they provide the right notion for SQL schema design. Axiomatic and linear-time algorithmic characterizations of the associated implication problem are established. These foundations enable us to propose a Boyce-Codd normal form for SQL. Indeed, we justify the normal form by showing that it permits precisely those SQL instances which are free from data redundancy. Unlike the relational case, there are SQL schemata that cannot be converted into Boyce-Codd normal form. Nevertheless, for an expressive sub-class of our functional dependencies we establish a normalization algorithm that always produces a schema in Value-Redundancy free normal form. This normal form permits precisely those instances which are free from any redundant data value occurrences other than the null marker. Experiments show that our functional dependencies occur frequently in real-world data and that they are effective in eliminating redundant values from these data sets without loss of information.

  • Book Chapter
  • 10.1007/978-3-031-27177-9_1
Introduction
  • Jan 1, 2023
  • Shaoxu Song + 1 more

Data dependencies, such as functional dependencies (FDs), have been long recognized as integrity constraints in databases [42]. They are firstly utilized in database design [3]. Conventionally, data dependencies are extracted from application requirements for database standardization, and used in database design to ensure data quality. For instance, functional dependencies (FDs) [16] are employed to evaluate whether a relation is in third normal form (3NF) [42] or Boyce-Codd normal form (BCNF) [43]. FDs are extended to multivalued dependencies (MVDs) [13], i.e., every FD is also an MVD, in order to test whether a relation is in fourth normal form (4NF) [55]. Such data dependencies are also used in database query optimization [53, 86].

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.is.2023.102208
Cardinality constraints and functional dependencies in SQL: Taming data redundancy in logical database design
  • Mar 22, 2023
  • Information Systems
  • Sebastian Link + 4 more

Cardinality constraints and functional dependencies in SQL: Taming data redundancy in logical database design

  • Research Article
  • Cite Count Icon 441
  • 10.1145/320064.320066
Computational problems related to the design of normal form relational schemas
  • Mar 1, 1979
  • ACM Transactions on Database Systems
  • Catriel Beeri + 1 more

Problems related to functional dependencies and the algorithmic design of relational schemas are examined. Specifically, the following results are presented: (1) a tree model of derivations of functional dependencies from other functional dependencies; (2) a linear-time algorithm to test if a functional dependency is in the closure of a set of functional dependencies; (3) a quadratic-time implementation of Bernstein's third normal form schema synthesis algorithm. Furthermore, it is shown that most interesting algorithmic questions about Boyce-Codd normal form and keys are NP -complete and are therefore probably not amenable to fast algorithmic solutions.

  • Conference Article
  • Cite Count Icon 13
  • 10.1145/2274576.2274589
A normal form for preventing redundant tuples in relational databases
  • Mar 26, 2012
  • Hugh Darwen + 2 more

We introduce a new normal form, called essential tuple normal form (ETNF), for relations in a relational database where the constraints are given by functional dependencies and join dependencies. ETNF lies strictly between fourth normal form and fifth normal form (5NF, also known as projection-join normal form). We show that ETNF, although strictly weaker than 5NF, is exactly as effective as 5NF in eliminating redundancy of tuples. Our definition of ETNF is semantic, in that it is defined in terms of tuple redundancy. We give a syntactic characterization of ETNF, which says that a relation schema is in ETNF if and only if it is in Boyce-Codd normal form and some component of every explicitly declared join dependency of the schema is a superkey.

More from: ACM Transactions on Database Systems
  • New
  • Research Article
  • 10.1145/3771733
Tuple-Independent Representations of Infinite Probabilistic Databases
  • Nov 6, 2025
  • ACM Transactions on Database Systems
  • Nofar Carmeli + 3 more

  • New
  • Research Article
  • 10.1145/3774753
Update NDP: On Offloading Modifications to Smart Storage with Transactional Guarantees in Near-Data Processing DBMS
  • Nov 4, 2025
  • ACM Transactions on Database Systems
  • Arthur Bernhardt + 4 more

  • Research Article
  • 10.1145/3774316
Uniform Operational Consistent Query Answering
  • Nov 1, 2025
  • ACM Transactions on Database Systems
  • Marco Calautti + 3 more

  • Research Article
  • 10.1145/3716378
Degree Sequence Bounds
  • Oct 25, 2025
  • ACM Transactions on Database Systems
  • Kyle Deeds + 3 more

  • Research Article
  • 10.1145/3771766
Saga++: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning Applications
  • Oct 14, 2025
  • ACM Transactions on Database Systems
  • Shafaq Siddiqi + 3 more

  • Research Article
  • 10.1145/3770577
Efficient Path Oracles for Proximity Queries on Point Clouds
  • Oct 2, 2025
  • ACM Transactions on Database Systems
  • Yinzhao Yan + 1 more

  • Research Article
  • 10.1145/3734517
Any-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries
  • Sep 30, 2025
  • ACM Transactions on Database Systems
  • Nikolaos Tziavelis + 2 more

  • Research Article
  • 10.1145/3760773
BISLearner: Block-Aware Index Selection using Attention-Based Reinforcement Learning for Data Analytics
  • Sep 29, 2025
  • ACM Transactions on Database Systems
  • Yulai Tong + 7 more

  • Research Article
  • 10.1145/3764583
Unveiling Logic Bugs in SPJG Query Optimizations within DBMS
  • Sep 29, 2025
  • ACM Transactions on Database Systems
  • Xiu Tang + 6 more

  • Research Article
  • 10.1145/3743130
Space-Time Tradeoffs for Conjunctive Queries with Access Patterns
  • Jul 26, 2025
  • ACM Transactions on Database Systems
  • Hangdong Zhao + 2 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon