An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems

Yiming Tang,Ajani Stewart,Mehdi Bagherzadeh,Anita Raja,Raffi Khatchadourian,Rhia Singh

doi:10.1109/icse43902.2021.00033

Yiming Tang, Ajani Stewart + Show 4 more

Open Access

https://doi.org/10.1109/icse43902.2021.00033

Copy DOI

Abstract

Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today's data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Engineering problems in machine learning systems
Hiroshi Kuwajima ... Hirotoshi Yasuoka
Machine Learning | VOL. 109
Hiroshi Kuwajima, et. al.Hiroshi Kuwajima ... Hirotoshi Yasuoka
23 Apr 2020
Machine Learning | VOL. 109

Explainable artificial intelligence (XAI) for predicting the need for intubation in methanol-poisoned patients: a study comparing deep and machine learning models
Khadijeh Moulaei ... Mitra Rahimi
Scientific Reports | VOL. 14
Khadijeh Moulaei, et. al.Khadijeh Moulaei ... Mitra Rahimi
08 Jul 2024
Scientific Reports | VOL. 14

Technical debt forecasting: An empirical study on open-source repositories
Dimitrios Tsoukalas ... Alexander Chatzigeorgiou
Journal of Systems and Software | VOL. 170
Dimitrios Tsoukalas, et. al.Dimitrios Tsoukalas ... Alexander Chatzigeorgiou
08 Aug 2020
Journal of Systems and Software | VOL. 170

Transforming ML Predictive Pipelines into SQL with MASQ
Francesco Del Buono ... Matteo Interlandi
-
Francesco Del Buono, et. al.Francesco Del Buono ... Matteo Interlandi
09 Jun 2021
09 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems

Abstract

Talk to us

Similar Papers