Exploring the Characteristics of Identifiers: A Large-Scale Empirical Study on 5,000 Open Source Projects

Jingxuan Zhang,Zhiqiu Huang,Junpeng Luo,Jiahui Liang,Siyuan Liu

doi:10.1109/access.2020.3013694

Jingxuan Zhang, Zhiqiu Huang + Show 3 more

Open Access

https://doi.org/10.1109/access.2020.3013694

Copy DOI

Abstract

Informative identifiers are crucial for the comprehensibility and maintainability of programs. Exploring properties of identifiers and investigating their impact on software artifacts have been an important research focus. However, to enable such capabilities, fundamentally we need to have comprehensive understanding on the main characteristics of identifiers at the first place, which is unfortunately not sufficiently studied. For example, it remained unclear what Part of Speech (POS) tags that developers commonly use to define identifiers. To answer such open issues, we conducted a large-scale empirical study on the naturalness of identifiers, based on 5,000 open source Java and Android projects, concerning five dimensions of identifiers: distributions, compositions, POS tags, lengths, and initializations of identifiers. Results of the empirical study contain five key findings for identifiers in programs, including, e.g., the observation that the three POS tags (i.e., nouns, verbs, and adjectives) are the most commonly used ones when developers define identifiers. Furthermore, based on our findings, we provide implications and insights for developers, researchers, and Integrated Development Environments (IDEs) in the context that identifier-related activities are performed or functionalities are enabled.

Highlights

Source code analyses and comprehension are important activities for developers to review and reuse existing knowledge in programs [1]
EMPIRICAL RESULTS we report the results of our empirical study, which is organized according to the types of analyses and Research Questions (RQs)
We provide the exact number and percentage for each category of identifiers in Table 2 and Table 3, which have not been reported in a large scale of open source projects in the previous studies

Summary

Introduction

Source code analyses and comprehension are important activities for developers to review and reuse existing knowledge in programs [1]. Informative and high-quality source code lexicon plays a crucial role in accelerating these activities. Among the various types of source code lexicon, identifiers account for almost 70% in programs [2], [3]. Identifiers can reflect the understanding and cognition of developers to the concepts and behaviors involved in programs of the software under development [5]. Identifiers should uniquely express their corresponding concepts and behaviors within a limited length [6]. It is required for developers to construct concise, consistent, and meaningful identifiers to improve the comprehensibility of programs, especially for the large and complex software

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Exploring the Characteristics of Identifiers: A Large-Scale Empirical Study on 5,000 Open Source Projects

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Part of speech tagging for Arabic
Sandra Kübler ... Emad Mohamed
Natural Language Engineering | VOL. 18
Sandra Kübler, et. al.Sandra Kübler ... Emad Mohamed
06 Dec 2011
Natural Language Engineering | VOL. 18

Part of speech tagging: a systematic review of deep learning and machine learning approaches
Alebachew Chiche ... Betselot Yitagesu
Journal of Big Data | VOL. 9
Alebachew Chiche, et. al.Alebachew Chiche ... Betselot Yitagesu
24 Jan 2022
Journal of Big Data | VOL. 9

Combination of Genetic Algorithm and Brill Tagger Algorithm for Part of Speech Tagging Bahasa Madura
Nindian Puspa Dewi ... Ubaidi Ubaidi
Proceeding of the Electrical Engineering Computer Science and Informatics | VOL. 7
Nindian Puspa Dewi, et. al.Nindian Puspa Dewi ... Ubaidi Ubaidi
01 Oct 2020
Proceeding of the Electrical Engineering Computer Science and Informatics | VOL. 7

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles
Rayner Alfred ... Joe Henry Obit
-
Rayner Alfred, et. al.Rayner Alfred ... Joe Henry Obit
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring the Characteristics of Identifiers: A Large-Scale Empirical Study on 5,000 Open Source Projects

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access