Deep code search efficiency based on clustering

Kun Liu,Haize Hu,Jianxun Liu

doi:10.1002/cpe.8027

Abstract

AbstractThe deep‐learning based code search model mainly takes accuracy as the only target for judging the performance of the model, ignoring the efficiency of code search. This article proposes a clustering‐based code search model (C‐DCS). C‐DCS uses the K‐Means to divide the code vector base into K clusters and obtains the center vectors of K clusters. While searching, C‐DCS first matches the query vector with the K center vectors to get the best matching center vector. After matching the center vector, C‐DCS matches the query vector with code vectors in the cluster corresponding to the best matching center vector one by one and then gets the best matching code snippet vector. To verify the efficiency of C‐DCS in the code search task, experimental analysis was built on a large dataset. The experimental results showed that C‐DCS saves 92.2% of the search time compared to the baseline model while remaining the accuracy. In the experimental evaluation section, we optimized the K‐Means algorithm to improve the code search efficiency of C‐DCS further, reducing the search time to 93.8% of the baseline model. Hence, C‐DCS reduces the code search time greatly with not affecting the accuracy, improving the efficiency of software development.

Full Text