Data profiling revisited

Felix Naumann

doi:10.1145/2590989.2590995

Abstract

Data profiling comprises a broad range of methods to efficiently analyze a given data set. In a typical scenario, which mirrors the capabilities of commercial data profiling tools, tables of a relational database are scanned to derive metadata, such as data types and value patterns, completeness and uniqueness of columns, keys and foreign keys, and occasionally functional dependencies and association rules. Individual research projects have proposed several additional profiling tasks, such as the discovery of inclusion dependencies or conditional functional dependencies. Data profiling deserves a fresh look for two reasons: First, the area itself is neither established nor defined in any principled way, despite significant research activity on individual parts in the past. Second, more and more data beyond the traditional relational databases are being created and beg to be profiled. The article proposes new research directions and challenges, including interactive and incremental profiling and profiling heterogeneous and non-relational data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Data profiling revisited

Abstract

Talk to us

Similar Papers

More From: ACM SIGMOD Record

Lead the way for us

Journal: ACM SIGMOD Record	Publication Date: Feb 28, 2014
Citations: 175

Similar Papers

Discover Dependencies from Data—A Review
Jixue Liu ... Yongfeng Chen
IEEE Transactions on Knowledge and Data Engineering | VOL. 24
Jixue Liu, et. al.Jixue Liu ... Yongfeng Chen
01 Feb 2012
IEEE Transactions on Knowledge and Data Engineering | VOL. 24

Revisiting Conditional Functional Dependency Discovery: Splitting the “C” from the “FD”
Joeri Rammelaere ... Floris Geerts
-
Joeri Rammelaere, et. al.Joeri Rammelaere ... Floris Geerts
01 Jan 2019
01 Jan 2019

Mining Constant Conditional Functional Dependencies for Improving Data Quality
D Devikalyani
International Journal of Computer Applications | VOL. 74
D DevikalyaniD Devikalyani
26 Jul 2013
International Journal of Computer Applications | VOL. 74

Discovering Conditional Functional Dependencies
Wenfei Fan ... Jianzhong Li
IEEE Transactions on Knowledge and Data Engineering | VOL. 23
Wenfei Fan, et. al.Wenfei Fan ... Jianzhong Li
01 May 2011
IEEE Transactions on Knowledge and Data Engineering | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data profiling revisited

Abstract

Talk to us

Similar Papers

More From: ACM SIGMOD Record