A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects

Andrew J Simmons,Akshat Bajaj,Rajesh Vasa,Scott Barnett,Jessica Rivera-Villicana

doi:10.1145/3382494.3410680

Abstract

Background: Meeting the growing industry demand for Data Science requires cross-disciplinary teams that can translate machine learning research into production-ready code. Software engineering teams value adherence to coding standards as an indication of code readability, maintainability, and developer expertise. However, there are no large-scale empirical studies of coding standards focused specifically on Data Science projects. Aims: This study investigates the extent to which Data Science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? Method: We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity. Results: Data Science projects suffer from a significantly higher rate of functions that use an excessive numbers of parameters and local variables. Data Science projects also follow different variable naming conventions to non-Data Science projects. Conclusions: The differences indicate that Data Science codebases are distinct from traditional software codebases and do not follow traditional software engineering conventions. Our conjecture is that this may be because traditional software engineering conventions are inappropriate in the context of Data Science projects.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Lessons learned to improve the UX practices in agile projects involving data science and process automation
Bruna Ferreira ... Marcos Kalinowski
Information and Software Technology | VOL. 155
Bruna Ferreira, et. al.Bruna Ferreira ... Marcos Kalinowski
01 Mar 2023
Information and Software Technology | VOL. 155

Demystifying Data Science Projects: A Look on the People and Process of Data Science Today
Timo Aho ... Sezin Yaman
-
Timo Aho, et. al.Timo Aho ... Sezin Yaman
01 Jan 2020
01 Jan 2020

AutoDS: Towards Human-Centered Automation of Data Science
Dakuo Wang ... Josh Andres
-
Dakuo Wang, et. al.Dakuo Wang ... Josh Andres
06 May 2021
06 May 2021

Trust, but Verify
Maris Sekar
-
Maris SekarMaris Sekar
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects

Abstract

Talk to us

Similar Papers