Revealing the Unseen: AI Chain on LLMs for Predicting Implicit Data Flows to Generate Data Flow Graphs in Dynamically-Typed Code

Qing Huang,Yong Chen,Jinshan Zeng,Jieshan Chen,Zhenchang Xing,Zhiwen Luo,Xiwei Xu

doi:10.1145/3672458

Abstract

Data flow graphs (DFGs) capture definitions (defs) and uses across program blocks, which is a fundamental program representation for program analysis, testing and maintenance. However, dynamically-typed programming languages like Python present implicit data flow issues that make it challenging to determine def-use flow information at compile time. Static analysis methods like Soot and WALA are inadequate for handling these issues, and manually enumerating comprehensive heuristic rules is impractical. Large pre-trained language models (LLMs) offer a potential solution, as they have powerful language understanding and pattern matching abilities, allowing them to predict implicit data flow by analyzing code context and relationships between variables, functions, and statements in code. We propose leveraging LLMs’ in-context learning ability to learn implicit rules and patterns from code representation and contextual information to solve implicit data flow problems. To further enhance the accuracy of LLMs, we design a five-step Chain of Thought (CoT) and break it down into an AI chain, with each step corresponding to a separate AI unit to generate accurate DFGs for Python code. Our approach’s performance is thoroughly assessed, demonstrating the effectiveness of each AI unit in the AI Chain. Compared to static analysis, our method achieves 82% higher def coverage and 58% higher use coverage in DFG generation on implicit data flow. We also prove the indispensability of each unit in the AI Chain. Overall, our approach offers a promising direction for building software engineering tools by utilizing foundation models, eliminating significant engineering and maintenance effort, but focusing on identifying problems for AI to solve.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Revealing the Unseen: AI Chain on LLMs for Predicting Implicit Data Flows to Generate Data Flow Graphs in Dynamically-Typed Code

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Software Engineering and Methodology

Lead the way for us

Similar Papers

Towards an Enhanced Understanding of Bias in Pre-trained Neural Language Models: A Survey with Special Emphasis on Affective Bias
Anoop K ... Lajish V L
-
Anoop K, et. al. Anoop K ... Lajish V L
01 Jan 2021
01 Jan 2021

Recovering Latent Data Flow from Business Process Model Automatically
Sheng Ye ... Chenhong Guo
Wireless Communications and Mobile Computing | VOL. 2022
Sheng Ye, et. al.Sheng Ye ... Chenhong Guo
20 Jun 2022
Wireless Communications and Mobile Computing | VOL. 2022

Adat és kiértékelési függőségi elemzés funkcionális nyelvekre - Erlang programok statikus elemzése
Melinda Tóth
-
Melinda TóthMelinda Tóth
01 Jan 2018
01 Jan 2018

Jigsaw
Naman Jain ... Arun Iyer
-
Naman Jain, et. al.Naman Jain ... Arun Iyer
21 May 2022
21 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Revealing the Unseen: AI Chain on LLMs for Predicting Implicit Data Flows to Generate Data Flow Graphs in Dynamically-Typed Code

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Software Engineering and Methodology