The Bounded Cardinality Normal Form for the Logical Design of Relational Database Schemata
The goal of classical normalization is to maintain data consistency under updates, with a minimum level of effort. Given functional dependencies (FDs) alone, this goal is only achievable in the special case an FD-preserving Boyce–Codd Normal Form (BCNF) decomposition exists. As we show, in all other cases the level of effort can be neither controlled nor quantified. In response, we establish the ℓ-Bounded Cardinality Normal Form, parameterized by a positive integer ℓ. For every ℓ, the normal form condition requires from every instance that every value combination over the left-hand side of every non-trivial FD does not occur in more than ℓ tuples. BCNF is captured when ℓ =1. We show that schemata in ℓ-Bounded Cardinality Normal Form characterize instances in which updates to at most ℓ occurrences of any redundant data value are sufficient to maintain data consistency. In fact, for the smallest ℓ in which a schema is in ℓ-Bounded Cardinality Normal Form we capture an equilibrium between worst-case update inefficiency and best-case join efficiency, where some redundant data value can be joined with up to ℓ other data values. We then establish algorithms that compute schemata in ℓ-Bounded Cardinality Normal Form for the smallest level ℓ attainable across all lossless, FD-preserving decompositions. Additional algorithms (i) attain even smaller levels of effort based on the loss of some FDs, and (ii) decompose schemata based on prioritized FDs that cause high levels of effort. Our framework informs de-normalization already during logical design. In particular, every materialized view exhibits an equilibrium level ℓ that quantifies its worst-case incremental maintenance cost and its best-case support for join queries. Experiments with synthetic and real-world data illustrate which properties the schemata have that result from our algorithms, and how these properties predict the performance of update and query operations on instances over the schemata, without and with materialized views. We further demonstrate how our framework can automate the design of data warehouses by mining data for dimensions that exhibit high levels of data redundancy. In an effort to align data and the business rules that govern them, we use lattice theory to characterize ℓ-Bounded Cardinality Normal Form on database instances and schemata. As a consequence, any difference in constraints observed at instance and schema levels provides an opportunity to improve data quality, insight derived from analytics, and database performance.
5
- 10.1109/tkde.2019.2911580
- Jan 1, 2020
- IEEE Transactions on Knowledge and Data Engineering
58
- 10.1002/j.1538-7305.1983.tb03470.x
- Dec 1, 1983
- Bell System Technical Journal
9
- 10.1007/s00778-018-0511-z
- Jul 2, 2018
- The VLDB Journal
53
- 10.1007/s002360050157
- Mar 1, 1999
- Acta Informatica
91
- 10.14778/3192965.3192968
- Mar 1, 2018
- Proceedings of the VLDB Endowment
83
- 10.1561/1900000020
- Jan 1, 2011
- Foundations and Trends® in Databases
30
- 10.1016/0020-0190(79)90082-6
- Jan 1, 1979
- Information Processing Letters
128
- 10.1016/s0306-4379(02)00021-2
- Jan 17, 2003
- Information Systems
49
- 10.1109/icde.1996.492110
- Mar 25, 2019
7
- 10.1145/1400214.1400240
- Nov 1, 2008
- Communications of the ACM
- Research Article
- 10.1145/3725362
- Jun 17, 2025
- Proceedings of the ACM on Management of Data
State-of-the-art relational schema design generates a lossless, dependency-preserving decomposition into Third Normal Form (3NF), that is in Boyce-Codd Normal Form (BCNF) whenever possible. In particular, dependency-preservation ensures that data integrity can be maintained on individual relation schemata without having to join them, but may need to tolerate a priori unbounded levels of data redundancy and integrity faults. As our main contribution we parameterize 3NF schemata by the numbers of minimal keys and functional dependencies they exhibit. Conceptually, these parameters quantify, already at schema design time, the effort necessary to maintain data integrity, and allow us to break ties between 3NF schemata. Computationally, the parameters enable us to optimize normalization into 3NF according to different strategies. Operationally, we show through experiments that our optimizations translate from the logical level into significantly smaller update overheads during integrity maintenance. Hence, our framework provides access to parameters that guide the computation of logical schema designs which reduce operational overheads.
- Conference Article
4
- 10.1109/cmpsac.1979.762527
- Jan 1, 1979
In the relational model, various relationships may exist between the attributes in a relation. The relationships are described by functional and multivalued dependencies. The functional dependencies are the basic elements for the decomposition of a relation into a family of relations in second, third, and Boyce-Codd normal forms. The multivalued dependencies, which include the functional dependencies as a special case, extend the decomposition further into a collection of relations in fourth normal form. This paper examines the conditions under which a relation in Boyce-Codd normal form should be decomposed further into fourth normal form and a fourth normal form normalization procedure is presented.
- Research Article
53
- 10.1007/s002360050157
- Mar 1, 1999
- Acta Informatica
The issue of providing a formal justification for the use of fourth normal form (4NF) in relational database design is investigated. The motivation and formal definitions for three goals of database design are presented. These goals are the elimination of: redundancy, key-based update anomalies and fact-based replacement anomalies. It is then shown that, depending on the type of constraints permitted, either Boyce-Codd normal form (BCNF) or 4NF are the exact conditions needed to ensure most of the design goals. However, it is also shown that the conditions required to ensure the absence of a particular class of key-based update anomaly are new normal forms which have not previously been identified. In particular, for the case where the only constraints are functional dependencies (FDs), it is shown that the required normal form is a new normal form that is stronger than third normal form (3NF) yet weaker than BCNF. Similarly, in the more general case where both FD and multivalued dependencies (MVDs) are present, the required normal form is a new normal form that is weaker than 4NF.
- Book Chapter
1
- 10.1016/b978-0-12-382020-4.00010-0
- Jan 1, 2011
- Database Modeling and Design
6 - Normalization
- Research Article
32
- 10.1145/310701.310712
- Mar 1, 1999
- ACM Transactions on Database Systems
Although there has been a vast amount of research in the area of relational database design, to our knowledge, there has been very little work that considers whether this theory is still valid when relations in the database may be incomplete. When relations are incomplete and thus contain null values the problem of whether satisfaction is additive arises. Additivity is the property of the equivalence of the satisfaction of a set of functional dependencies (FDs) F with the individual satisfaction of each member of F in an incomplete relation. It is well known that in general, satisfaction of FDs is not additive. Previously we have shown that satisfaction is additive if and only if the set of FDs is monodependent. We conclude that monodependence is a fundamental desirable property of a set of FDs when considering incomplete information in relational database design. We show that, when the set of FDs F either satifies the intersection property or the split-freeness property, then the problem of finding an optimum cover of F can be solved in polynomial time in the size of F; in general, this problem is known to be NP-complete. We also show that when F satisfies the split-freeness property then deciding whether there is a superkey of cardinality k or less can be solved in polynomial time in the size of F, since all the keys have the same cardinality. If F only satisfies the intersection property then this problem is NP-complete, as in the general case. Moreover, we show that when F either satisfies the intersection property or the split-freeness property then deciding whether an attribute is prime can be solved in polynomial time in the size of F; in general, this problem is known to be NP-complete. Assume that a relation schema R is an appropriate normal form with respect to a set of FDs F. We show that when F satisfies the intersection property then the notions of second normal form and third normal form are equivalent. We also show that when R is in Boyce-Codd Normal Form (BCNF), then F is monodependent if and only if either there is a unique key for R, or for all keys X for R, the cardinality of X is one less than the number of attributes associated with R. Finally, we tackle a long-standing problem in relational database theory by showing that when a set of FDs F over R satisfies the intersection property, it also satisfies the split-freeness property (i.e., is monodependent), if and only if every lossless join decomposition of R with respect to F is also dependecy preserving. As a corollary of this result we are able to show that when F satisfies the intersection property, it also satisfies the intersection property, it also satisfies the split-freeness property(i.e., is monodependent), if and only if every lossless join decomposition of R, which is in BCNF, is also dependency preserving. Our final result is that when F is monodependent, then there exists a unique optimum lossless join decomposition of R, which is in BCNF, and is also dependency preserving. Furthermore, this ultimate decomposition can be attained in polynomial time in the size of F.
- Research Article
38
- 10.1016/j.is.2018.04.001
- Apr 13, 2018
- Information Systems
SQL schema design: foundations, normal forms, and normalization
- Research Article
22
- 10.1016/0020-0190(89)90065-3
- Jul 1, 1989
- Information Processing Letters
Boyce–Codd normal form and object normal forms
- Research Article
55
- 10.1145/319732.319749
- Sep 1, 1982
- ACM Transactions on Database Systems
This paper addresses the problem of database schema design in the framework of the relational data model and functional dependencies. It suggests that both Third Normal Form (3NF) and Boyce-Codd Normal Form (BCNF) supply an inadequate basis for relational schema design. The main problem with 3NF is that it is too forgiving and does not enforce the separation principle as strictly as it should. On the other hand, BCNF is incompatible with the principle of representation and prone to computational complexity. Thus a new normal form, which lies between these two and captures the salient qualities of both is proposed. The new normal form is stricter than 3NF, but it is still compatible with the representation principle. First a simpler definition of 3NF is derived, and the analogy of this new definition to the definition of BCNF is noted. This analogy is used to derive the new normal form. Finally, it is proved that Bernstein's algorithm for schema design synthesizes schemata that are already in the new normal form.
- Book Chapter
- 10.1016/b978-012369379-2/50002-9
- Jan 1, 2005
- Joe Celko's SQL for Smarties
Chapter 2 - Normalization
- Research Article
7
- 10.1016/j.dam.2007.02.014
- Jun 16, 2007
- Discrete Applied Mathematics
Non-deterministic ideal operators: An adequate tool for formalization in Data Bases
- Conference Article
22
- 10.1145/2882903.2915239
- Jun 14, 2016
Normalization helps us find a database schema at design time that can process the most frequent updates efficiently at run time. Unfortunately, relational normalization only works for idealized database instances in which duplicates and null markers are not present. On one hand, these features occur frequently in real-world data compliant with the industry standard SQL, and especially in modern application domains. On the other hand, the features impose challenges that have made it impossible so far to extend the existing forty year old normalization framework to SQL. We introduce a new class of functional dependencies and show that they provide the right notion for SQL schema design. Axiomatic and linear-time algorithmic characterizations of the associated implication problem are established. These foundations enable us to propose a Boyce-Codd normal form for SQL. Indeed, we justify the normal form by showing that it permits precisely those SQL instances which are free from data redundancy. Unlike the relational case, there are SQL schemata that cannot be converted into Boyce-Codd normal form. Nevertheless, for an expressive sub-class of our functional dependencies we establish a normalization algorithm that always produces a schema in Value-Redundancy free normal form. This normal form permits precisely those instances which are free from any redundant data value occurrences other than the null marker. Experiments show that our functional dependencies occur frequently in real-world data and that they are effective in eliminating redundant values from these data sets without loss of information.
- Book Chapter
- 10.1007/978-3-031-27177-9_1
- Jan 1, 2023
Data dependencies, such as functional dependencies (FDs), have been long recognized as integrity constraints in databases [42]. They are firstly utilized in database design [3]. Conventionally, data dependencies are extracted from application requirements for database standardization, and used in database design to ensure data quality. For instance, functional dependencies (FDs) [16] are employed to evaluate whether a relation is in third normal form (3NF) [42] or Boyce-Codd normal form (BCNF) [43]. FDs are extended to multivalued dependencies (MVDs) [13], i.e., every FD is also an MVD, in order to test whether a relation is in fourth normal form (4NF) [55]. Such data dependencies are also used in database query optimization [53, 86].
- Research Article
4
- 10.1016/j.is.2023.102208
- Mar 22, 2023
- Information Systems
Cardinality constraints and functional dependencies in SQL: Taming data redundancy in logical database design
- Research Article
441
- 10.1145/320064.320066
- Mar 1, 1979
- ACM Transactions on Database Systems
Problems related to functional dependencies and the algorithmic design of relational schemas are examined. Specifically, the following results are presented: (1) a tree model of derivations of functional dependencies from other functional dependencies; (2) a linear-time algorithm to test if a functional dependency is in the closure of a set of functional dependencies; (3) a quadratic-time implementation of Bernstein's third normal form schema synthesis algorithm. Furthermore, it is shown that most interesting algorithmic questions about Boyce-Codd normal form and keys are NP -complete and are therefore probably not amenable to fast algorithmic solutions.
- Conference Article
13
- 10.1145/2274576.2274589
- Mar 26, 2012
We introduce a new normal form, called essential tuple normal form (ETNF), for relations in a relational database where the constraints are given by functional dependencies and join dependencies. ETNF lies strictly between fourth normal form and fifth normal form (5NF, also known as projection-join normal form). We show that ETNF, although strictly weaker than 5NF, is exactly as effective as 5NF in eliminating redundancy of tuples. Our definition of ETNF is semantic, in that it is defined in terms of tuple redundancy. We give a syntactic characterization of ETNF, which says that a relation schema is in ETNF if and only if it is in Boyce-Codd normal form and some component of every explicitly declared join dependency of the schema is a superkey.
- New
- Research Article
- 10.1145/3771733
- Nov 6, 2025
- ACM Transactions on Database Systems
- New
- Research Article
- 10.1145/3774753
- Nov 4, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3774316
- Nov 1, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3716378
- Oct 25, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3771766
- Oct 14, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3770577
- Oct 2, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3734517
- Sep 30, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3760773
- Sep 29, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3764583
- Sep 29, 2025
- ACM Transactions on Database Systems
- Research Article
- 10.1145/3743130
- Jul 26, 2025
- ACM Transactions on Database Systems
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.