VTWM: An Incremental Data Extraction Model Based on Variable Time-Windows

Weixing Jia,Guiling Wang,Jie Liu,Yang Xu

doi:10.4108/eai.12-6-2020.166291

Weixing Jia, Guiling Wang + Show 2 more

Open Access

https://doi.org/10.4108/eai.12-6-2020.166291

Copy DOI

Abstract

Continuously extracting and integrating changing data from various heterogeneous systems based on an appropriate data extraction model is the key to data sharing and integration and also the key to building an incremental data warehouse for data analysis. The traditional data capture method based on timestamp changes is plagued with anomalies in the data extraction process, which leads to data extraction failure and affects the efficiency of data extraction. To address the above problems, this paper improves the traditional data capture model based on timestamp increments and proposes VTWM, an incremental data extraction model based on variable time-windows, based on the idea of extracting a small number of duplicate records before removing duplicate values. The model reduces the influence of abnormalities on data extraction, improves the reliability of the traditional data extraction ETL processes, and improves the data extraction efficiency.

Highlights

In enterprises or government departments, due to the different development times and different development agencies, there are often multiple heterogeneous information systems running on different hardware and software platforms at the same time
There are mainly three main approaches for the study of change data capture: Log-based approach [9,10,11,12,13], Trigger-based approach [2,15] and Timestamp-based approach[3,5,16] A full-table comparison incremental extraction method based on database transaction log files, called L-C incremental extraction method, is proposed in [10]
Incremental timestamp-based data extraction is achieved by maintaining an additional database table to store the time of the last data extraction [5]

Summary

Introduction

In enterprises or government departments, due to the different development times and different development agencies, there are often multiple heterogeneous information systems running on different hardware and software platforms at the same time. The main contribution of VTWM proposed in this paper is that it alleviates the problem that the database rollback and the efficiency of the de-duplication operation decrease with the increase of the data volume of the target table due to the exception of the up-extraction operation of the traditional model. It reduces the impact of anomalies on data extraction by taking into account the efficiency of data extraction under the premise of ensuring reliability.

Related Work

Traditional incremental timestamp-based data extraction model

Definitions

Data deduplication

Implementation

Experiment and Analysis

Experimental environment

Comparison and analysis of reliability

Comparison and analysis of time performance

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

VTWM: An Incremental Data Extraction Model Based on Variable Time-Windows

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EAI Endorsed Transactions on Collaborative Computing

Lead the way for us

Journal: EAI Endorsed Transactions on Collaborative Computing	Publication Date: Jul 13, 2018
License type: cc-by

Similar Papers

A Web Data Extraction Description Language and Its Implementation
I-Chen Wu ... Loon-Been Chen
-
I-Chen Wu, et. al. I-Chen Wu ... Loon-Been Chen
26 Jul 2005
26 Jul 2005

Data interchange using i2b2.
Jeffrey G Klann ... Vijay A Raghavan
Journal of the American Medical Informatics Association | VOL. 23
Jeffrey G Klann, et. al.Jeffrey G Klann ... Vijay A Raghavan
05 Feb 2016
Journal of the American Medical Informatics Association | VOL. 23

Extractive text summarization system to aid data extraction from full text in systematic review development
Duy Duc An Bui ... Siddhartha Jonnalagadda
Journal of Biomedical Informatics | VOL. 64
Duy Duc An Bui, et. al.Duy Duc An Bui ... Siddhartha Jonnalagadda
27 Oct 2016
Journal of Biomedical Informatics | VOL. 64

The Significance of using Data Extraction Methods for an Effective Big Data Mining Process
Manish Sharma ... Richa Gupta
-
Manish Sharma, et. al.Manish Sharma ... Richa Gupta
03 Mar 2023
03 Mar 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VTWM: An Incremental Data Extraction Model Based on Variable Time-Windows

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EAI Endorsed Transactions on Collaborative Computing