Which Variables Should I Log?

Zhongxin Liu,Ahmed E Hassan,Zhenchang Xing,Shanping Li,David Lo,Xin Xia

doi:10.1109/tse.2019.2941943

Abstract

Developers usually depend on inserting logging statements into the source code to collect system runtime information. Such logged information is valuable for software maintenance. A logging statement usually prints one or more variables to record vital system status. However, due to the lack of rigorous logging guidance and the requirement of domain-specific knowledge, it is not easy for developers to make proper decisions about which variables to log. To address this need, in this work, we propose an approach to recommend logging variables for developers during development by learning from existing logging statements. Different from other prediction tasks in software engineering, this task has two challenges: 1) Dynamic labels – different logging statements have different sets of accessible variables, which means in this task, the set of possible labels of each sample is not the same. 2) Out-of-vocabulary words – identifiers’ names are not limited to natural language words and the test set usually contains a number of program tokens which are out of the vocabulary built from the training set and cannot be appropriately mapped to word embeddings. To deal with the first challenge, we convert this task into a representation learning problem instead of a multi-label classification problem. Given a code snippet which lacks a logging statement, our approach first leverages a neural network with an RNN (recurrent neural network) layer and a self-attention layer to learn the proper representation of each program token, and then predicts whether each token should be logged through a unified binary classifier based on the learned representation. To handle the second challenge, we propose a novel method to map program tokens into word embeddings by making use of the pre-trained word embeddings of natural language tokens. We evaluate our approach on 9 large and high-quality Java projects. Our evaluation results show that the average MAP of our approach is over 0.84, outperforming random guess and an information-retrieval-based method by large margins.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Software Engineering	Publication Date: Jan 1, 2019
Citations: 33	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Which Variables Should I Log?

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Software Engineering

Lead the way for us

Similar Papers

Studying software logging using topic models
Heng Li ... Tse-Hsun Chen
Empirical Software Engineering | VOL. 23
Heng Li, et. al.Heng Li ... Tse-Hsun Chen
30 Jan 2018
Empirical Software Engineering | VOL. 23

Logging statements' prediction based on source code clones
Sina Gholamian ... Paul A S Ward
-
Sina Gholamian, et. al.Sina Gholamian ... Paul A S Ward
30 Mar 2020
30 Mar 2020

Which log level should developers choose for a new logging statement? (journal-first abstract)
Heng Li ... Weiyi Shang
-
Heng Li, et. al.Heng Li ... Weiyi Shang
01 Mar 2018
01 Mar 2018

From word embeddings to document similarities for improved information retrieval in software engineering
Xin Ye ... Xiao Ma
-
Xin Ye, et. al.Xin Ye ... Xiao Ma
14 May 2016
14 May 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Which Variables Should I Log?

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Software Engineering