Abstract

Predicting the fix time of a bug is important for managing the resources and release milestones of a software development project. However, it is considered non-trivial to achieve high accuracy when predicting bug-fix times. We view that such difficulties come from the lack of continuous or posterior estimation based on subsequent developers’ activities after a bug is initially reported. In this paper, we formulate the problem of bug-fix time prediction into a continual update of estimates with more activities. Logging data of bug-related activities that are streamed to a bug tracking system change the bug reports, enabling us to recalculate predictions over time. To do so, we propose a deep learning-based two-staged activity stream embedding model, DASENet that employs (i) a merged network for extracting contextual features across different types of logs, and (ii) a sequence network for exploring temporal relations of the logs. Through experiments with bug tracking system datasets from open source projects including Firefox, Chromium, and Eclipse, we show that DASENet achieves stable performance, e.g., for the Firefox dataset, top-1 accuracy of 4.6 to 8.5 % higher than other state-of-the-art works. Our approach also provides a transferable structure, yielding robust performance with a small dataset for different tasks; the DASENet model trained with a small dataset of about 900 samples (2 % of a full dataset) can show competitive performance to the other models with a full dataset. To the best of our knowledge, we are the first to employ deep learning on log streams in the context of bug-fix time prediction.

Highlights

  • Data in a bug tracking system are frequently used as an essential part of managing the schedule, quality, and resources of software development in both industry practice and academic literature

  • Several researchers investigated the problem of predicting bug-fix times, and most of this research addressed the problem by performing either regression [17], [23]–[25] or classification [18]–[22] based on the features extracted from the attributes of bug reports

  • On the contrary to the purpose of those deep learning-based approaches, we focus on the continual prediction of bug-fix times and adapt deep neural networks for analyzing log streams of bug-related activities

Read more

Summary

INTRODUCTION

Data in a bug tracking system are frequently used as an essential part of managing the schedule, quality, and resources of software development in both industry practice and academic literature. Y. Lee et al.: Continual Prediction of Bug-Fix Time Using Deep Learning-Based Activity Stream Embedding. We consider heterogeneity of log types, and we develop joint learning and merged network This network structure facilitates automated feature extraction from various types of bug-related activity logs, by combining a set of individual embedding networks wherein each is structured respectively for a specific type. We propose a two-staged leaning model, DASENet (Deep learning-based Activity Stream Embedding Network) that leverages the integrated use of a merged network and a sequence network; the former combines different types of logging data to a per-day activity summation, and the other generates embedding that reflects all the accumulated per-day activities in a common vector space. We first propose a continual approach for bug-fix time prediction by exploiting deep learning techniques with data streams of bug-related activity logs. We present a data- and time-efficient transferring procedure for variant tasks, which leverages the activity stream embedding of DASENet

RELATED WORKS
STREAM DATA PREPROCESSING
ACTIVITY STREAM EMBEDDING
ACTIVITY BIN EMBEDDING
BIN-SEQUENCE EMBEDDING
DASENET MODEL IMPLEMENTATION
TASK LEARNING PROCESS
DATASETS
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call