CCCS: Contrastive Cross-Language Code Search Using Code Graph Information

Li Kuang,Honghao Gao,Yi Cheng

doi:10.1145/3628429

Abstract

Developers often search and reuse existing code snippets to improve software development efficiency during software development. Currently, researchers have proposed many code search methods. However, the search intent of existing methods is basically a natural language query. In order to support code migration and code refactoring, it is necessary to search relevant code snippets of another programming language with code snippets of one programming language. In this paper, we propose a C ontrastive C ross-language C ode S earch method using code graph information, called CCCS . CCCS first converts code snippets into high-dimensional vectors using pre-trained CodeBERT to extract the sequence features of code snippets. Next, the structural features of code snippets are extracted using a graph convolutional neural network. Finally, the model is trained using the contrastive learning method to optimize the vector representation of cross-language code snippets, enabling the model to distinguish code snippets from different programming languages with the same functionality. To evaluate the effectiveness of our method, we conducted comparison experiments and ablation experiments on a small-scale dataset and a large-scale dataset, respectively. The experimental results show that our method far outperforms the state-of-the-art baseline model in terms of MRR metrics.

Full Text