Call Graph Analysis Research Articles

Jupyter notebooks have emerged as the predominant tool for data scientists to develop and share machine learning solutions, primarily using Python as the programming language. Despite their widespread adoption, a significant fraction of these notebooks, when shared on public repositories, suffer from insufficient documentation and a lack of coherent narrative. Such shortcomings compromise the readability and understandability of the notebook. Addressing this shortcoming, this paper introduces HeaderGen, a tool-based approach that automatically augments code cells in these notebooks with descriptive markdown headers, derived from a predefined taxonomy of machine learning operations. Additionally, it systematically classifies and displays function calls in line with this taxonomy. The mechanism that powers HeaderGen is an enhanced call graph analysis technique, building upon the foundational analysis available in PyCG. To improve precision, HeaderGen extends PyCG’s analysis with return-type resolution of external function calls, type inference, and flow-sensitivity. Furthermore, leveraging type information, HeaderGen employs pattern matching techniques on the code syntax to annotate code cells. We conducted an empirical evaluation on 15 real-world Jupyter notebooks sourced from Kaggle. The results indicate a high accuracy in call graph analysis, with precision at 95.6% and recall at 95.3%. The header generation has a precision of 85.7% and a recall rate of 92.8% with regard to headers created manually by experts. A user study corroborated the practical utility of HeaderGen, revealing that users found HeaderGen useful in tasks related to comprehension and navigation. To further evaluate the type inference capability of static analysis tools, we introduce TypeEvalPy, a framework for evaluating type inference tools for Python with an in-built micro-benchmark containing 154 code snippets and 845 type annotations in the ground truth. Our comparative analysis on four tools revealed that HeaderGen outperforms other tools in exact matches with the ground truth.

Read full abstract

Logs in bug reports provide important debugging information for developers. During the debugging process, developers need to study the bug report and examine user-provided logs to understand the system executions that lead to the problem. Intuitively, user-provided logs illustrate the problems that users encounter and may help developers with the debugging process. However, some logs may be incomplete or inaccurate, which can cause difficulty for developers to diagnose the bug, and thus, delay the bug fixing process. In this paper, we conduct an empirical study on the challenges that developers may encounter when analyzing the user-provided logs and their benefits. In particular, we study both log snippets and exception stack traces in bug reports. We conduct our study on 10 large-scale open-source systems with a total of 1,561 bug reports with logs (BRWL) and 7,287 bug reports without logs (BRNL). Our findings show that: 1) BRWL takes longer time (median ranges from 3 to 91 days) to resolve compared to BRNL (median ranges from 1 to 25 days). We also find that reporters may not attach accurate or sufficient logs (i.e., developers often ask for additional logs in the Comments section of a bug report), which extends the bug resolution time. 2) Logs often provide a good indication of where a bug is located. Most bug reports (73%) have overlaps between the classes that generate the logs and their corresponding fixed classes. However, there is still a large number of bug reports where there is no overlap between the logged and fixed classes. 3) Our manual study finds that there is often missing system execution information in the logs. Many logs only show the point of failure (e.g., exception) and do not provide a direct hint on the actual root cause. In fact, through call graph analysis, we find that 28% of the studied bug reports have the fixed classes reachable from the logged classes, while they are not visible in the logs attached in bug reports. In addition, some logging statements are removed in the source code as the system evolves, which may cause further challenges in analyzing the logs. In short, our findings highlight possible future research directions to better help practitioners attach or analyze logs in bug reports.

Read full abstract

Call Graph Analysis Research Articles

Related Topics

Articles published on Call Graph Analysis

Static analysis driven enhancements for comprehension in machine learning notebooks

Demystifying the challenges and benefits of analyzing user-reported logs in bug reports

Consistency Validation Method for Java Fine-Grained Lock Refactoring

JNI Global References Are Still Vulnerable: Attacks and Defenses

Post-Deployment Anomaly Detection and Diagnosis in Networked Embedded Systems by Program Profiling and Symptom Mining

Accelerating program analyses by cross-program training

Call graphs for languages with parametric polymorphism

Ускорение оптимизации программ во время связывания

Geant4 Computing Performance Benchmarking and Monitoring

Reusable Function Discovery by Call-Graph Analysis

Efficient compilation strategy for object‐oriented languages under the closed‐world assumption

Static analysis of functional programs

Efficient call graph analysis

Automatic autoprojection of recursive equations with global variables and abstract data types

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Call Graph Analysis Research Articles

Related Topics

Articles published on Call Graph Analysis

Static analysis driven enhancements for comprehension in machine learning notebooks

Demystifying the challenges and benefits of analyzing user-reported logs in bug reports

Consistency Validation Method for Java Fine-Grained Lock Refactoring

JNI Global References Are Still Vulnerable: Attacks and Defenses

Post-Deployment Anomaly Detection and Diagnosis in Networked Embedded Systems by Program Profiling and Symptom Mining

Accelerating program analyses by cross-program training

Call graphs for languages with parametric polymorphism

Ускорение оптимизации программ во время связывания

Geant4 Computing Performance Benchmarking and Monitoring

Reusable Function Discovery by Call-Graph Analysis

Efficient compilation strategy for object‐oriented languages under the closed‐world assumption

Static analysis of functional programs

Efficient call graph analysis

Automatic autoprojection of recursive equations with global variables and abstract data types