Identifying Recurring Faulty Functions in Field Traces of a Large Industrial Software System

Syed Shariyar Murtaza,Nazim H Madhavji,Mechelle Gittens,Abdelwahab Hamou-Lhadj

doi:10.1109/tr.2014.2366274

Abstract

Software maintainers use the traces of field failures to understand and diagnose faulty functions that cause the system to fail. Despite their usefulness, traces from the field can be quite overwhelming, especially for software systems with a vast client base. In the execution of realistic applications, many of them being millions of lines of code, there are just too many traces that are generated. In addition, traces are known to be extraordinarily large, which further complicates matters. Fortunately, not all field failures are caused by new faults. In fact, previous studies showed that 50% to 90% of field failures are due to previously known faults. In this paper, we propose a machine learning approach that automatically detects recurring faulty functions in the traces of new field failures. We achieve our goal by training decision trees on earlier resolved traces of system failures from the current and prior releases of the system. When applied to a large industrial system with 20 million lines of code and 200,000 functions, our approach was able to detect recurring faulty functions in the traces of field failures with an accuracy of 90%, to even 97% in some cases.

Full Text