Data Preparation: A Technological Perspective and Review

Alvaro A A Fernandes,Martin Koehler,Nikolaos Konstantinou,Pavel Pankin,Rizos Sakellariou,Norman W Paton

doi:10.1007/s42979-023-01828-8

Alvaro A A Fernandes, Martin Koehler + Show 4 more

Open Access

https://doi.org/10.1007/s42979-023-01828-8

Copy DOI

Journal: SN Computer Science	Publication Date: Jun 2, 2023
Citations: 3	License type: open-access

Affiliation: University of Manchester

Abstract

Data analysis often uses data sets that were collected for different purposes. Indeed, new insights are often obtained by combining data sets that were produced independently of each other, for example by combining data from outside an organization with internal data resources. As a result, there is a need to discover, clean, integrate and restructure data into a form that is suitable for an intended analysis. Data preparation, also known as data wrangling, is the process by which data are transformed from its existing representation into a form that is suitable for analysis. In this paper, we review the state-of-the-art in data preparation, by: (i) describing functionalities that are central to data preparation pipelines, specifically profiling, matching, mapping, format transformation and data repair; and (ii) presenting how these capabilities surface in different approaches to data preparation, that involve programming, writing workflows, interacting with individual data sets as tables, and automating aspects of the process. These functionalities and approaches are illustrated with reference to a running example that combines open government data with web extracted real estate data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Data Preparation: A Technological Perspective and Review

Abstract

Talk to us

Similar Papers

More From: SN Computer Science

Lead the way for us

Similar Papers

Assessing patterns in introduction pathways of alien species by linking major invasion data bases
Wolf‐Christian Saul ... Helen E Roy
Journal of Applied Ecology | VOL. 54
Wolf‐Christian Saul, et. al.Wolf‐Christian Saul ... Helen E Roy
21 Nov 2016
Journal of Applied Ecology | VOL. 54

Phylogenetic Relationships of Extant Ferns Based on Evidence from Morphology and rbcL Sequences
Kathleen M Pryer ... Alan R Smith
American Fern Journal | VOL. 85
Kathleen M Pryer, et. al.Kathleen M Pryer ... Alan R Smith
01 Oct 1995
American Fern Journal | VOL. 85

Wheat Yield Loss in Response to Italian Ryegrass in Diverse Environments
Martin J Stone ... Travis D Miller
Journal of Production Agriculture | VOL. 12
Martin J Stone, et. al.Martin J Stone ... Travis D Miller
01 Apr 1999
Journal of Production Agriculture | VOL. 12

Inferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms.
Douglas E Soltis ... Sara B Hoot
Systematic Biology | VOL. 47
Douglas E Soltis, et. al.Douglas E Soltis ... Sara B Hoot
01 Mar 1998
Systematic Biology | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data Preparation: A Technological Perspective and Review

Abstract

Talk to us

Similar Papers

More From: SN Computer Science