Chapter 10 - Scrubbing Data with Non-1 NF Tables

Joe Celko

doi:10.1016/b978-012374137-0.50011-0

Abstract

This chapter appraises the methods of data scrubbing. “Data scrubbing” is an important function for a database to get clean and perfect data. There will likely be some common problems that go with data from non-SQL sources. Old file system layouts will have to be reformatted and often split into many tables. Old encodings may have to be updated to current systems. Not all data types match to native SQL data types if the data source is old. SQL does not require that a table have unique constraints, a primary key, or anything else that would ensure data integrity. Part of the scrubbing is to find which people have some or all of a particular code. The first thought of an experienced SQL programmer is to normalize the repeated group. The obvious way to do this is with a derived table. The reason that this fools experienced SQL programmers is that they know that a schema should be in 1NF and they immediately fix that problem without looking a bit further. The trick is the use of an IN () predicate in case of a repeating group. This will give just the names of those who have one or more target codes. Repeated groups of fields in a file system should be split out into multiple tables in a normalized schema. But on the way to that goal, it is recommended to check and see that values in each repeated group are sorted from left to right, because that ordering carries some meaning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chapter 10 - Scrubbing Data with Non-1 NF Tables

Abstract

Talk to us

Similar Papers

More From: Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL

Lead the way for us

Similar Papers

Improving Data Integrity and Performance of Cryptographic Structured Log File Systems
Genti Daci ... Megi Shyle
-
Genti Daci, et. al.Genti Daci ... Megi Shyle
01 Jan 2012
01 Jan 2012

The Simple Boxplot Method for an Effective Prediction
Reonaldo ... Manatap Dolok Lauro
IOP Conference Series: Materials Science and Engineering | VOL. 1007
Reonaldo, et. al. Reonaldo ... Manatap Dolok Lauro
01 Dec 2020
IOP Conference Series: Materials Science and Engineering | VOL. 1007

SimITK: Visual Programming of the ITK Image-Processing Library within Simulink
Andrew W L Dickinson ... David G Gobbi
Journal of Digital Imaging | VOL. 27
Andrew W L Dickinson, et. al.Andrew W L Dickinson ... David G Gobbi
09 Jan 2014
Journal of Digital Imaging | VOL. 27

Interview
Paulino Ruiz‐De‐Clavijo
Electronics Letters | VOL. 49
Paulino Ruiz‐De‐ClavijoPaulino Ruiz‐De‐Clavijo
01 Sep 2013
Electronics Letters | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chapter 10 - Scrubbing Data with Non-1 NF Tables

Abstract

Talk to us

Similar Papers

More From: Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL