Abstract
This chapter appraises the methods of data scrubbing. “Data scrubbing” is an important function for a database to get clean and perfect data. There will likely be some common problems that go with data from non-SQL sources. Old file system layouts will have to be reformatted and often split into many tables. Old encodings may have to be updated to current systems. Not all data types match to native SQL data types if the data source is old. SQL does not require that a table have unique constraints, a primary key, or anything else that would ensure data integrity. Part of the scrubbing is to find which people have some or all of a particular code. The first thought of an experienced SQL programmer is to normalize the repeated group. The obvious way to do this is with a derived table. The reason that this fools experienced SQL programmers is that they know that a schema should be in 1NF and they immediately fix that problem without looking a bit further. The trick is the use of an IN () predicate in case of a repeating group. This will give just the names of those who have one or more target codes. Repeated groups of fields in a file system should be split out into multiple tables in a normalized schema. But on the way to that goal, it is recommended to check and see that values in each repeated group are sorted from left to right, because that ordering carries some meaning.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.