Studying and suggesting logging locations in code blocks

Zhenhao Li

doi:10.1145/3377812.3382168

Abstract

Developers write logging statements to generate logs and record system execution behaviors to assist in debugging and software maintenance. However, there exists no practical guidelines on where to write logging statements. On one hand, adding too many logging statements may introduce superfluously trivial logs and performance overheads. On the other hand, logging too little may miss necessary runtime information. Thus, properly deciding the logging location is a challenging task and a finer-grained under-standing of where to write logging statements is needed to assist developers in making logging decisions. In this paper, we conduct a comprehensive study to uncover guidelines on logging locations at the code block level. We analyze logging statements and their surrounding code by combining both deep learning techniques and manual investigations. From our preliminary results, we find that our deep learning models achieve over 90% in precision and recall when trained using the syntactic (e.g., nodes in abstract syntax tree) and semantic (e.g., variable names) features. However, cross-system models trained using semantic features only have 45.6% in precision and 73.2% in recall, while models trained using syntactic features still have over 90% precision and recall. Our current progress high-lights that there is an implicit syntactic logging guideline across systems, and such information may be leveraged to uncover general logging guidelines.

Full Text