DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports

Korosh Koochekian Sabor,Alf Larsson,Abdelwahab Hamou-Lhadj

doi:10.1109/qrs.2017.35

Abstract

The detection of duplicate bug reports can help reduce the processing time of handling field crashes. This is especially important for software companies with a large client base where multiple customers can submit bug reports, caused by the same faults. There exist several techniques for the detection of duplicate bug reports; many of them rely on some sort of classification techniques applied to information extracted from stack traces. They classify each report using functions invoked in the stack trace associated with the bug report. The problem is that typical bug repositories may have stack traces that contain tens of thousands of functions, which causes the curse of dimensionality problem. In this paper, we propose a feature extraction technique that reduces the feature size and yet retains the information that is most critical for the classification. The proposed feature extraction approach starts by abstracting stack traces of function calls into sequences of package names, by replacing each function with the package in which it is defined. We then segment these traces into multiple N-grams of variable length and map them to fixed-size sparse feature vectors, which are used to measure the distance between the stack trace of incoming bug report with a historical set of bug reports stack traces. The linear combination of stack trace similarity and non-textual fields such as component and severity are then used to measure the distance of a bug report with a historical set of bug reports. We show the effectiveness of our approach by applying it to the Eclipse bug repository that contains tens of thousands of bug reports. Our approach outperforms the approach that uses distinct function names, while significantly reducing the processing time.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

An HMM-based approach for automatic detection and classification of duplicate bug reports
Neda Ebrahimi ... Kobra Khanmohammadi
Information and Software Technology | VOL. 113
Neda Ebrahimi, et. al.Neda Ebrahimi ... Kobra Khanmohammadi
16 May 2019
Information and Software Technology | VOL. 113

Duplicate Bug Report Detection and Classification System Based on Deep Learning Technique
Ashima Kukkar ... Muhammad Bilal
IEEE Access | VOL. 8
Ashima Kukkar, et. al.Ashima Kukkar ... Muhammad Bilal
01 Jan 2020
IEEE Access | VOL. 8

Improved Duplicate Bug Report Identification
Yuan Tian ... Chengnian Sun
-
Yuan Tian, et. al.Yuan Tian ... Chengnian Sun
01 Mar 2012
01 Mar 2012

Preventing duplicate bug reports by continuously querying bug reports
Abram Hindle ... Curtis Onuczko
Empirical Software Engineering | VOL. 24
Abram Hindle, et. al.Abram Hindle ... Curtis Onuczko
20 Aug 2018
Empirical Software Engineering | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports

Abstract

Talk to us

Similar Papers