Java Methods Research Articles

Pretrained large language models (LLMs) have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular domains. Here, we target bioinformatics due to the amount of domain knowledge, algorithms, and data operations this discipline requires. We present BioCoder, a benchmark developed to evaluate LLMs in generating bioinformatics-specific code. BioCoder spans much of the field, covering cross-file dependencies, class declarations, and global variables. It incorporates 1026 Python functions and 1243 Java methods extracted from GitHub, along with 253 examples from the Rosalind Project, all pertaining to bioinformatics. Using topic modeling, we show that the overall coverage of the included code is representative of the full spectrum of bioinformatics calculations. BioCoder incorporates a fuzz-testing framework for evaluation. We have applied it to evaluate various models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, and GPT-4. Furthermore, we fine-tuned one model (StarCoder), demonstrating that our training dataset can enhance the performance on our testing benchmark (by >15% in terms of Pass@K under certain prompt configurations and always >3%). The results highlight two key aspects of successful models: (i) Successful models accommodate a long prompt (>2600 tokens) with full context, including functional dependencies. (ii) They contain domain-specific knowledge of bioinformatics, beyond just general coding capability. This is evident from the performance gain of GPT-3.5/4 compared to the smaller models on our benchmark (50% versus up to 25%). All datasets, benchmark, Docker images, and scripts required for testing are available at: https://github.com/gersteinlab/biocoder and https://biocoder-benchmark.github.io/.

Read full abstract

The complexity of software projects and the rapid technological evolution make it such that developers often need additional help and knowledge to tackle their daily tasks. For this purpose, they often refer to online resources, which are easy to access and contain a wealth of information in various formats. Programming screencasts hosted on platforms such as YouTube are one such online resource that has seen a growth in popularity and adoption over the past decade. These screencasts usually have some metadata such as a title, a short description, and a set of tags that should describe what the main concepts captured in the video are. Unfortunately, metadata are often generic and do not contain detailed information about the code showcased in the tutorial, such as the API calls or graphical user interface (GUI) elements employed, which could lead to developers missing useful tutorials. Having a quick overview of the main code elements and GUIs used in a video tutorial can be very helpful for developers looking for code examples involving specific API calls, or looking to design applications with a specific GUI in mind. The aim is to make this information easily available to developers, and propose VID2META, a technique that automatically extracts Java import statements, class names, method information, GUI elements, and GUI screens from videos and makes them available to developers as metadata. VID2META is currently designed to work with Android screencasts. It analyzes video frames using a combination of computer vision, deep learning, optical character recognition, and heuristic-based approaches to identify the needed information in a frame, extract it, and present it to the developer. VID2META has been evaluated in an empirical study on 70 Android programming videos collected from YouTube. The results revealed that VID2META can accurately detect and extract Java and GUI elements from Android programming videos with an average accuracy of 90%.

Read full abstract

Java Methods Research Articles

Related Topics

Articles published on Java Methods

An intelligent java method name recommendation framework via two-phase neural networks

The Correlation between Java Programming and BSICT Graduate Attributes of Information Technology Students at Surigao del Norte State University

Estudio comparativo de los paradigmas de programación orientada a objetos y programación reactiva en la resolución de Integrales Algebraicas

BioCoder: a benchmark for bioinformatics code generation with large language models.

Dataset of Functionally Equivalent Java Methods and Its Application to Evaluating Clone Detection Tools

Method-level Bug Prediction: Problems and Promises

Высокопроизводительный алгоритм решения проблем связного списка с использованием техники быстрого и медленного указателей

Log statements generation via deep learning: Widening the support provided to developers

Lightweight precise automatic extraction of exception preconditions in java methods

APPLICATION OF DEEP BREATHING TECHNIQUES IN REDUCING POST SECTIO CAESAREA PAIN

Intelligent Visual Representation for Java Code Data in the Field of Software Engineering Based on Remote Sensing Techniques

PassSum: Leveraging paths of abstract syntax trees and self‐supervision for code summarization

G-DCS: GCN-Based Deep Code Summary Generation Model

RECOMMENDING JAVA API METHODS BASED ON PROGRAMMING TASK DESCRIPTIONS BY NOVICE PROGRAMMERS

NURSE AND PATIENT PERCEPTION IN SELF-MONITORING OF DAWN EFFECT TO ENHANCE SELF-MANAGEMENT IN DIABETES MELLITUS PATIENT’S: A QUALITATIVE CASE STUDY

μDep: Mutation-Based Dependency Generation for Precise Taint Analysis on Android Native Code

Method name recommendation based on source code metrics

Bi-LSTM-Based Neural Source Code Summarization

VID2META: Complementing Android Programming Screencasts with Code Elements and GUIs

Revisiting the debate: Are code metrics useful for measuring maintenance effort?

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Java Methods Research Articles

Related Topics

Articles published on Java Methods

An intelligent java method name recommendation framework via two-phase neural networks

The Correlation between Java Programming and BSICT Graduate Attributes of Information Technology Students at Surigao del Norte State University

Estudio comparativo de los paradigmas de programación orientada a objetos y programación reactiva en la resolución de Integrales Algebraicas

BioCoder: a benchmark for bioinformatics code generation with large language models.

Dataset of Functionally Equivalent Java Methods and Its Application to Evaluating Clone Detection Tools

Method-level Bug Prediction: Problems and Promises

Высокопроизводительный алгоритм решения проблем связного списка с использованием техники быстрого и медленного указателей

Log statements generation via deep learning: Widening the support provided to developers

Lightweight precise automatic extraction of exception preconditions in java methods

APPLICATION OF DEEP BREATHING TECHNIQUES IN REDUCING POST SECTIO CAESAREA PAIN

Intelligent Visual Representation for Java Code Data in the Field of Software Engineering Based on Remote Sensing Techniques

PassSum: Leveraging paths of abstract syntax trees and self‐supervision for code summarization

G-DCS: GCN-Based Deep Code Summary Generation Model

RECOMMENDING JAVA API METHODS BASED ON PROGRAMMING TASK DESCRIPTIONS BY NOVICE PROGRAMMERS

NURSE AND PATIENT PERCEPTION IN SELF-MONITORING OF DAWN EFFECT TO ENHANCE SELF-MANAGEMENT IN DIABETES MELLITUS PATIENT’S: A QUALITATIVE CASE STUDY

μDep: Mutation-Based Dependency Generation for Precise Taint Analysis on Android Native Code

Method name recommendation based on source code metrics

Bi-LSTM-Based Neural Source Code Summarization

VID2META: Complementing Android Programming Screencasts with Code Elements and GUIs

Revisiting the debate: Are code metrics useful for measuring maintenance effort?