Self-Attention Networks for Code Search

Sen Fang,Yepang Liu,You-Shuai Tan,Tao Zhang

doi:10.1016/j.infsof.2021.106542

Abstract

Developers tend to search and reuse code snippets from a large-scale codebase when they want to implement some functions that exist in the previous projects, which can enhance the efficiency of software development. As the first deep learning-based code search model, DeepCS outperforms prior models such as Sourcere and CodeHow. However, it utilizes two separate LSTM to represent code snippets and natural language descriptions respectively, which ignores semantic relations between code snippets and their descriptions. Consequently, the performance of DeepCS falls into the bottleneck, and thus our objective is to break this bottleneck. We propose a self-attention joint representation learning model, named SAN-CS ( S elf- A ttention N etwork for C ode S earch). Comparing with DeepCS, we directly utilize the self-attention network to construct our code search model. By a weighted average operation, self-attention networks can fully capture the contextual information of code snippets and their descriptions. We first utilize two individual self-attention networks to represent code snippets and their descriptions, respectively, and then we utilize the self-attention network to conduct an extra joint representation network for code snippets and their descriptions, which can build semantic relationships between code snippets and their descriptions. Therefore, SAN-CS can break the performance bottleneck of DeepCS. We evaluate SAN-CS on the dataset shared by Gu et al. and choose two baseline models , DeepCS and CARLCS-CNN. Experimental results demonstrate that SAN-CS achieves significantly better performance than DeepCS and CARLCS-CNN. In addition, SAN-CS has better execution efficiency than DeepCS at the training and testing phase. This paper proposes a code search model, SAN-CS. It utilizes the self-attention network to perform the joint attention representations for code snippets and their descriptions, respectively. Experimental results verify the effectiveness and efficiency of SAN-CS.

Full Text