Coding Task Research Articles

Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming language. Code LLMs produce impressive results on high-resource programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available (e.g., OCaml, Racket, and several others). This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, called MultiPL-T, generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM. MultiPL-T translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize unit tests for commented code from a high-resource source language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate the code from the high-resource source language to a target low-resource language. This gives us a corpus of candidate training data in the target language, but many of these translations are wrong. 3) We use a lightweight compiler to compile the test cases generated in (1) from the source language to the target language, which allows us to filter our obviously wrong translations. The result is a training corpus in the target low-resource language where all items have been validated with test cases. We apply this approach to generate tens of thousands of new, validated training items for five low-resource languages: Julia, Lua, OCaml, R, and Racket, using Python as the source high-resource language. Furthermore, we use an open Code LLM (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done. Using datasets generated with MultiPL-T, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket that outperform other fine-tunes of these base models on the natural language to code task. We also present Racket fine-tunes for two very recent models, DeepSeek Coder and StarCoder2, to show that MultiPL-T continues to outperform other fine-tuning approaches for low-resource languages. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer.

Read full abstract

The family is the primary and main center of society. The family is the main institution of society. In turn, the institution of the family includes many more private institutions, namely the institution of marriage, the institution of kinship, the institution of motherhood and fatherhood, the institution of property, the institution of social protection of childhood and guardianship, and others. Marriage is an institution that regulates the relationship between a man and a woman. It is marriage in human society that is considered the only acceptable, socially approved and legally established form. Marriage is the family union of a woman and a man, registered in the state registry of civil status acts. Persons who have reached marriageable age have the right to marry. Marriage is based on the free consent of a woman and a man. The religious rite of marriage takes place now everywhere in Ukraine. The spiritual majesty of the temple or even the smallest church with church singing, crowns on the heads of the bride and groom, kind parting words of the priest - all this makes the wedding an unforgettable event in life. However, the religious rite of marriage has no legal significance. The religious rite of marriage, as before, does not create the rights and obligations of a spouse, remaining a personal matter of a woman and a man. Persons who, after the wedding, live together without registering a marriage have the status of a family, but do not have the legal status of spouses. Legal regulation of family relations is the main task of the Family Code of Ukraine, which defines the principles of marriage, personal non-property and property rights and obligations of spouses, the grounds for their origin, the content of personal non-property and property rights and obligations of parents and children, adopters and adoptees, other members family and relatives. Family relations are regulated by the Family Code of Ukraine with the aim of strengthening the family as a social institution and as a union of specific persons, establishing a sense of duty to parents, children and other family members, building family relations on the basis of parity, on feelings of mutual love and respect, mutual help and support, providing each child with family upbringing, the possibility of spiritual and physical development. A marriage is terminated as a result of its dissolution by joint application of the spouses or one of them. A marriage is terminated as a result of its dissolution by a joint application of the spouses on the basis of a court decision. A marriage is terminated as a result of its dissolution at the request of one of the spouses on the basis of a court decision in accordance with the Family Code of Ukraine.

Read full abstract

Coding Task Research Articles

Related Topics

Papers published in recent years

Articles published on Coding Task

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Уголовно-правовая наука в исследовании задач уголовного законодательства

Exploring the frontiers of generative AI in assessment: Is there potential for a human‐AI partnership?

What Kind of Transformer Models to Use for the ICD-10 Codes Classification Task.

Research on image compression technology based on improved SPIHT compression algorithm for power grid data

Primary healthcare professionals' approach to clinical coding: a qualitative interview study in Wales.

Multi-Label Classification of Pure Code

Not Merely Useful but Also Amusing: Impact of Perceived Usefulness and Perceived Enjoyment on the Adoption of AI-Powered Coding Assistant

Children’s mathematical engagement based on their awareness of coding toy design features

Exploring the Key Factors Influencing College Students’ Willingness to Use AI Coding Assistant Tools: An Expanded Technology Acceptance Model

Unit Test Generation Using Large Language Models: A Systematic Literature Review

Preoperative executive functioning impairments in patients with a meningioma: does a frontal location matter?

Effect of toroidal rotation on impurity transport in tokamak improved confinement

Bayesian modelling disentangles language versus executive control disruption in stroke.

A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts

Legal nature of marital relations: theory and practice

Multi-Intent Inline Code Comment Generation via Large Language Model

Creating a computer assisted ICD coding system: Performance metric choice and use of the ICD hierarchy

Can Large Language Models Transform Computational Social Science?

Can ChatGPT code the technical part of a Bachelor's Thesis in Informatics?

Editage

Paperpal

R Discovery

Mind the Graph

Coding Task Research Articles

Related Topics

Papers published in recent years

Articles published on Coding Task

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Уголовно-правовая наука в исследовании задач уголовного законодательства

Exploring the frontiers of generative AI in assessment: Is there potential for a human‐AI partnership?

What Kind of Transformer Models to Use for the ICD-10 Codes Classification Task.

Research on image compression technology based on improved SPIHT compression algorithm for power grid data

Primary healthcare professionals' approach to clinical coding: a qualitative interview study in Wales.

Multi-Label Classification of Pure Code

Not Merely Useful but Also Amusing: Impact of Perceived Usefulness and Perceived Enjoyment on the Adoption of AI-Powered Coding Assistant

Children’s mathematical engagement based on their awareness of coding toy design features

Exploring the Key Factors Influencing College Students’ Willingness to Use AI Coding Assistant Tools: An Expanded Technology Acceptance Model

Unit Test Generation Using Large Language Models: A Systematic Literature Review

Preoperative executive functioning impairments in patients with a meningioma: does a frontal location matter?

Effect of toroidal rotation on impurity transport in tokamak improved confinement

Bayesian modelling disentangles language versus executive control disruption in stroke.

A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts

Legal nature of marital relations: theory and practice

Multi-Intent Inline Code Comment Generation via Large Language Model

Creating a computer assisted ICD coding system: Performance metric choice and use of the ICD hierarchy

Can Large Language Models Transform Computational Social Science?

Can ChatGPT code the technical part of a Bachelor's Thesis in Informatics?