Python API Misuse Mining and Classification Based on Hybrid Analysis and Attention Mechanism

Xincheng He,Xiaojin Liu,Lei Xu

doi:10.1142/s0218194023500432

Abstract

APIs play a crucial role in contemporary software development, streamlining implementation and maintenance processes. However, improper API usage can result in significant issues such as unexpected outcomes, security vulnerabilities and system crashes. To detect API misuses, current methods primarily rely on comparing established API usage patterns with target points for automated detection, mainly based on pre-validated datasets. Nonetheless, there is a scarcity of publicly available datasets on API misuses and their corresponding fixes, which hinders data-driven research. Moreover, most existing techniques concentrate on statically typed languages, such as Java and C, with only a few addressing dynamic languages like Python effectively, due to difficulties in handling dynamic features. Therefore, it is essential to identify Python API misuses and their fixes automatically and promptly. In this paper, we introduce HatPAM, a Hybrid Analysis and Attention-based Python API-Misuse Miner, which (a) provides a method for automatically mining true-positive commits related to Python API-misuse fixes from GitHub and (b) presents the subsequent processing for classifying Python API misuses in true-positive cases. Particularly, HatPAM applies hybrid static analysis and introduces a structure-based attention mechanism to examine syntax, semantics and structural features in Python code context, and considers the consistency between code and developers’ natural intent to significantly reduce false-positive cases. Evaluation on six popular Python projects reveals that HatPAM outperforms various state-of-the-art baselines, achieving up to 92.2% Precision, 86.7% Recall and 89.3% F1-score, indicating its capability to identify and classify Python API-misuse commits.

Full Text