Abstract

A tool that can search over large code corpus directly and list ranked snippets can prove to be an invaluable resource to programmers looking for similar code snippets using natural language queries. It must have a deep understanding of the semantics of source code and queries to evaluate their intent correctly. Over the years, many tools that rely on the textual similarity between source code and query have proven to be ineffective as they fail to learn the high- level semantic understanding of source code and query. While the previous models for code search using deep neural networks do a good job but, most of them only evaluate their models on only a single programming language, mostly Java. In this paper, we propose a novel deep neural network model called Unified Code Net that can handle the intricacies of different programming languages. This model borrows several vital features from different previous models and builds on top of those ideas to make a unified model that can generate document vector embeddings from source code, and using similarity search with the query vector embedding can return the most similar code snippets in any language. This tool can drastically reduce the programmer’s efforts to look for an efficient and viable code snippet for problem at hand which ideally can replace use of search engines for the same

Highlights

  • Code Search can provide a massive boost in productivity of programmers as the recent uptick in the use of deep learning for code search, and rise of computing power has made it possible to retrieve related code from a massive code corpus that matches programmer’s intent from natural language queries

  • This saves the programmer from the hassle of Google Searching for related code snippets to get something done or endless browsing of community forums like StackOverflow looking for possible usage of a proprietary API or some standard coding problems/algorithm implementation

  • Semantic Code Search makes it possible to search for such snippets directly using natural language queries and get ranked semantically similar code snippets of a particular required language

Read more

Summary

INTRODUCTION

Code Search can provide a massive boost in productivity of programmers as the recent uptick in the use of deep learning for code search, and rise of computing power has made it possible to retrieve related code from a massive code corpus that matches programmer’s intent from natural language queries This saves the programmer from the hassle of Google Searching for related code snippets to get something done or endless browsing of community forums like StackOverflow looking for possible usage of a proprietary API or some standard coding problems/algorithm implementation. A simple query like "How to read text file line by line?" returns ranked snippets in the required language. Top Result for Java - Query: Read Text File Line by Line In the above examples, the document vectors and query vector are semantically similar and are mapped closely, which means that the model has a high-level understanding of what the function does and what the query intends to find

DATASET
Preprocessing
Filtering
Limitations
Word Embeddings
Encoder
Evaluation Metric
Evaluation Method
THREATS TO VALIDITY
CONCLUSION
10. Facebook
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call