Empirical Evaluation of Mimic Software Project Data Sets for Software Effort Estimation

Maohua Gan,Kentaro Sasaki,Akito Monden,Zeynep Yücel

doi:10.1587/transinf.2019edp7150

Maohua Gan, Kentaro Sasaki + Show 2 more

Open Access

https://doi.org/10.1587/transinf.2019edp7150

Copy DOI

Abstract

To conduct empirical research on industry software development, it is necessary to obtain data of real software projects from industry. However, only few such industry data sets are publicly available; and unfortunately, most of them are very old. In addition, most of today's software companies cannot make their data open, because software development involves many stakeholders, and thus, its data confidentiality must be strongly preserved. To that end, this study proposes a method for artificially generating a “mimic” software project data set, whose characteristics (such as average, standard deviation and correlation coefficients) are very similar to a given confidential data set. Instead of using the original (confidential) data set, researchers are expected to use the mimic data set to produce similar results as the original data set. The proposed method uses the Box-Muller transform for generating normally distributed random numbers; and exponential transformation and number reordering for data mimicry. To evaluate the efficacy of the proposed method, effort estimation is considered as potential application domain for employing mimic data. Estimation models are built from 8 reference data sets and their concerning mimic data. Our experiments confirmed that models built from mimic data sets show similar effort estimation performance as the models built from original data sets, which indicate the capability of the proposed method in generating representative samples.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEICE Transactions on Information and Systems	Publication Date: Oct 1, 2020
Citations: 1	License type: free

R Discovery Prime

R Discovery Prime

Empirical Evaluation of Mimic Software Project Data Sets for Software Effort Estimation

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems

Lead the way for us

Similar Papers

3D-Printed Iodine-Ink CT Phantom for Radiomics Feature Extraction - Advantages and Challenges.
Michael Bach ... Ender Konukoglu
Medical physics | VOL. 50
Michael Bach, et. al.Michael Bach ... Ender Konukoglu
01 Apr 2023
Medical physics | VOL. 50

Identification Of Walnut Variety From The Leaves Using Deep Learning Algorithms
Alper Talha Karadeni̇z ... Yuksel Celik
Bitlis Eren Üniversitesi Fen Bilimleri Dergisi | VOL. 12
Alper Talha Karadeni̇z, et. al.Alper Talha Karadeni̇z ... Yuksel Celik
27 Jun 2023
Bitlis Eren Üniversitesi Fen Bilimleri Dergisi | VOL. 12

Phylogenomics of Annelida revisited: a cladistic approach using genome-wide expressed sequence tag data mining and examining the effects of missing data.
Sebastian Kvist ... Mark E Siddall
Cladistics | VOL. 29
Sebastian Kvist, et. al.Sebastian Kvist ... Mark E Siddall
22 Feb 2013
Cladistics | VOL. 29

The Issue of Missing Values in Data Mining
Malcolm J Beynon
-
Malcolm J BeynonMalcolm J Beynon
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Empirical Evaluation of Mimic Software Project Data Sets for Software Effort Estimation

Abstract

Talk to us

Similar Papers

More From: IEICE Transactions on Information and Systems