Query-oriented two-stage attention-based model for code search

Ling Xu,Luwen Huangfu,Chao Liu,Huanhuan Yang

doi:10.1016/j.jss.2023.111948

Abstract

Applying code search models to search through a large-scale codebase can significantly contribute to developers finding and reusing existing code. Researchers have applied deep learning (DL) techniques to code search models, which first compute deeper semantics representation for query and candidate code snippets, and then rank code snippets. However, these models do not well deeply analyze the semantics gap (i.e., the difference and correlation between queries written in natural language and code in programming languages), or suitably apply the correlation to the code search task. Moreover, most DL-based models use complex networks, slowing down code search tasks.To build the correlation of two languages, and apply the correlation well to code search task, we propose a query-oriented code search model named QobCS. QobCS leverage two attention-based stages, which are simple and quick, and the cooperation of the two stages bridges the semantic gap between code and query. Stage1 learns deeper semantics representation for code and query. Stage2 applies their deeper semantic correlation and query’s intention to learn better code representation.We evaluated QobCS on two datasets. On dataset1/dataset2 with 485k/542k code snippets, QobCS achieves the MRRs of 0.701/0.595, outperforming DL-based code search models DeepCS, CARLCS-CNN, UNIF, and our prior study TabCS. For efficiency, our model shows desirable performances on both datasets compared to DL-based models.

Full Text