Abstract
With significant advances in deep learning, the code generation from natural language description has become a prevailing research. Existing researches demonstrate that these methods have achieved high BLEU values. However, the data sets used in the existing researches lack diversity, and they usually use BLEU as the only evaluation metric. To overcome these limitations, in this paper, we crawled a data set that is more suitable for code generation from the online judge system, and re-run the existing code generation models on this data set. We evaluate the generated code from five aspects: lexical similarity, tree similarity, syntactic legality, semantic legality, and functional correctness. This study provides a deeper analysis of the performance of existing code generation methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.