Abstract

This chapter examines the most important technology available to the data quality assurance team: data profiling. Data profiling is defined as the application of data analysis techniques to existing data stores for the purpose of determining the actual content, structure, and quality of the data. This distinguishes it from data analysis techniques used to derive business information from data. Data profiling technology starts with the assumption that any available metadata describing rules for correctness of the data is either wrong or incomplete. The data profiling process will generate accurate metadata as an output of the process by relying on the data for reverse-engineering the metadata and comparing it to the proposed metadata. Data profiling is a process that involves learning from the data. It employs discovery and analytical techniques to find characteristics of the data that can then be looked at by a business analyst to determine if the data matches the business intent. Data profiling is usually done with a single analyst or small team of analysts performing most of the analytical work, and several other participants adding value to that analysis. The data profiling analyst is generally part of the data quality assurance team.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.