Software Engineering Research Research Articles

Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming language. Code LLMs produce impressive results on high-resource programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available (e.g., OCaml, Racket, and several others). This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, called MultiPL-T, generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM. MultiPL-T translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize unit tests for commented code from a high-resource source language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate the code from the high-resource source language to a target low-resource language. This gives us a corpus of candidate training data in the target language, but many of these translations are wrong. 3) We use a lightweight compiler to compile the test cases generated in (1) from the source language to the target language, which allows us to filter our obviously wrong translations. The result is a training corpus in the target low-resource language where all items have been validated with test cases. We apply this approach to generate tens of thousands of new, validated training items for five low-resource languages: Julia, Lua, OCaml, R, and Racket, using Python as the source high-resource language. Furthermore, we use an open Code LLM (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done. Using datasets generated with MultiPL-T, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket that outperform other fine-tunes of these base models on the natural language to code task. We also present Racket fine-tunes for two very recent models, DeepSeek Coder and StarCoder2, to show that MultiPL-T continues to outperform other fine-tuning approaches for low-resource languages. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer.

Read full abstract

Context:Artificial Intelligence (AI) is pervasive in several application domains and promises to be even more diffused in the next decades. Developing high-quality AI-enabled systems — software systems embedding one or multiple AI components, algorithms, and models — could introduce critical challenges for mitigating specific risks related to the systems’ quality. Such development alone is insufficient to fully address socio-technical consequences and the need for rapid adaptation to evolutionary changes. Recent work proposed the concept of AI technical debt, a potential liability concerned with developing AI-enabled systems whose impact can affect the overall systems’ quality. While the problem of AI technical debt is rapidly gaining the attention of the software engineering research community, scientific knowledge that contributes to understanding and managing the matter is still limited. Objective:In this paper, we leverage the expertise of practitioners to offer useful insights to the research community, aiming to enhance researchers’ awareness about the detection and mitigation of AI technical debt. Our ultimate goal is to empower practitioners by providing them with tools and methods. Additionally, our study sheds light on novel aspects that practitioners might not be fully acquainted with, contributing to a deeper understanding of the subject. Method:We develop a survey study featuring 53 AI practitioners, in which we collect information on the practical prevalence, severity, and impact of AI technical debt issues affecting the code and the architecture other than the strategies applied by practitioners to identify and mitigate them. Results:The key findings of the study reveal the multiple impacts that AI technical debt issues may have on the quality of AI-enabled systems (e.g., the high negative impact that Undeclared consumers has on security, whereas Jumbled Model Architecture can induce the code to be hard to maintain) and the little support practitioners have to deal with them, limited to apply manual effort for identification and refactoring. Conclusion:We conclude the article by distilling lessons learned and actionable insights for researchers.

Read full abstract

Software Engineering Research Research Articles

Related Topics

Articles published on Software Engineering Research

Editorial para el número especial de la 12a Conferencia Internacional sobre Investigación e Innovación en Ingeniería de Software

A survey on Cryptoagility and Agile Practices in the light of quantum resistance

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

The Trailer of the ACM 2030 Roadmap for Software Engineering

Mobile Software Engineering is Coming to an End (Like All Good Things Must)

7th International Workshop on Software-intensive Business (IWSiB 2024): Software-intensive Business in the Era of Generative Artificial Intelligence

A Meta-Study of Software-Change Intentions

Leveraging Large Language Models for Automating Inductive Qualitative Coding: A Comparative Study of Prompt Engineering Techniques

Software stewardship and advancement of a high-performance computing scientific application: QMCPACK

Guidelines for using financial incentives in software-engineering experimentation

Enhancing effort estimation in global software development using a unique combination of Neuro Fuzzy Logic and Deep Learning Neural Networks (NFDLNN)

SERP4IoT'24 Workshop Report

Can GPT-4 Replicate Empirical Software Engineering Research?

Technical debt in AI-enabled systems: On the prevalence, severity, impact, and management strategies for code and architecture

Open Archaeology, Open Source? Collaborative practices in an emerging community of archaeological software engineers

Challenges, adaptations, and fringe benefits of conducting software engineering research with human participants during the COVID-19 pandemic

Transformers and meta-tokenization in sentiment analysis for software engineering

Reporting case studies in systematic literature studies—An evidential problem

Supporting Emotional Intelligence, Productivity and Team Goals while Handling Software Requirements Changes

Enablers and Barriers of Empathy in Software Developer and User Interactions: A Mixed Methods Case Study

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Software Engineering Research Research Articles

Related Topics

Articles published on Software Engineering Research

Editorial para el número especial de la 12a Conferencia Internacional sobre Investigación e Innovación en Ingeniería de Software

A survey on Cryptoagility and Agile Practices in the light of quantum resistance

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

The Trailer of the ACM 2030 Roadmap for Software Engineering

Mobile Software Engineering is Coming to an End (Like All Good Things Must)

7th International Workshop on Software-intensive Business (IWSiB 2024): Software-intensive Business in the Era of Generative Artificial Intelligence

A Meta-Study of Software-Change Intentions

Leveraging Large Language Models for Automating Inductive Qualitative Coding: A Comparative Study of Prompt Engineering Techniques

Software stewardship and advancement of a high-performance computing scientific application: QMCPACK

Guidelines for using financial incentives in software-engineering experimentation

Enhancing effort estimation in global software development using a unique combination of Neuro Fuzzy Logic and Deep Learning Neural Networks (NFDLNN)

SERP4IoT'24 Workshop Report

Can GPT-4 Replicate Empirical Software Engineering Research?

Technical debt in AI-enabled systems: On the prevalence, severity, impact, and management strategies for code and architecture

Open Archaeology, Open Source? Collaborative practices in an emerging community of archaeological software engineers

Challenges, adaptations, and fringe benefits of conducting software engineering research with human participants during the COVID-19 pandemic

Transformers and meta-tokenization in sentiment analysis for software engineering

Reporting case studies in systematic literature studies—An evidential problem

Supporting Emotional Intelligence, Productivity and Team Goals while Handling Software Requirements Changes

Enablers and Barriers of Empathy in Software Developer and User Interactions: A Mixed Methods Case Study