Application of deep learning models to generate rich, dynamic and production-like test data

Chao Tan,Razieh Behjati,Erik Arisholm

doi:10.1007/s10664-024-10541-w

Abstract

Traditionally, software development teams in many industries have used copies of production databases or their masked, anonymized, or obfuscated versions for testing. However, privacy protection regulations, for example, the General Data Protection Regulation (GDPR), prohibit such practices. In such a situation, there is often a need to generate production-like test data, i.e., test data that is statistically representative of the production data and conforms to the domain’s constraints. In this paper, we address this need by presenting a novel approach for generating production-like test data using deep learning techniques and studying the practical effectiveness of our proposed approach in industrial settings. We frame the problem of generating production-like test data as a Language Modeling problem. We then propose a general solution for test data generation and a framework for evaluating and comparing language models based on training effectiveness and the representativeness and validity of the generated data. To evaluate the practical effectiveness of our solution, we apply it to a case study: the Norwegian National Population Registry (NPR). Within the context of NPR, we experiment with three of the most successful Deep Learning algorithms for Language Modeling, namely Recurrent Neural Networks (RNN), Variational Autoencoders, and Generative Adversarial Networks (GANs). We furthermore evaluate and compare the effectiveness of these algorithms quantitatively, using the proposed evaluation framework. The results from our case study show that our approach can generate highly complex data that is statistically representative of the production data and conforms to the business rules of the domain. Moreover, test data generated with RNN, which outperforms the other two algorithms, are syntactically and semantically valid in more than 97% of the cases and are highly representative of the real NPR data. The practical applicability of our approach is evident from the fact that our approach was fully deployed in a test environment at the NPR, generating on-the-fly and scalable amounts of production-like test data that are used in the integration testing between NPR and its data consumers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Application of deep learning models to generate rich, dynamic and production-like test data

Abstract

Talk to us

Similar Papers

More From: Empirical Software Engineering

Lead the way for us

Journal: Empirical Software Engineering	Publication Date: Oct 18, 2024
License type: CC BY 4.0

Similar Papers

Application of deep learning models for detection of subdural hematoma: a systematic review and meta-analysis
Saeed Abdollahifard ... Ashkan Mowla
Journal of NeuroInterventional Surgery | VOL. 162
Saeed Abdollahifard, et. al.Saeed Abdollahifard ... Ashkan Mowla
23 Nov 2022
Journal of NeuroInterventional Surgery | VOL. 162

Implementation of computer vision technology based on artificial intelligence for medical image analysis
Danqing Ma ... Hengyi Zang
International Journal of Computer Science and Information Technology | VOL. 1
Danqing Ma, et. al.Danqing Ma ... Hengyi Zang
30 Dec 2023
International Journal of Computer Science and Information Technology | VOL. 1

Application of deep learning models to detect coastlines and shorelines
Kinh Bac Dang ... Trung Hieu Do
Journal of Environmental Management | VOL. 320
Kinh Bac Dang, et. al.Kinh Bac Dang ... Trung Hieu Do
03 Aug 2022
Journal of Environmental Management | VOL. 320

Deep learning in finance and banking: A literature review and classification
Jian Huang ... Junyi Chai
Frontiers of Business Research in China | VOL. 14
Jian Huang, et. al.Jian Huang ... Junyi Chai
08 Jun 2020
Frontiers of Business Research in China | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Application of deep learning models to generate rich, dynamic and production-like test data

Abstract

Talk to us

Similar Papers

More From: Empirical Software Engineering