Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development

João Lucas Correia,Renato Cerqueira,Alessandro Garcia,Rafael Mello,Rohit Gheyi,Baldoino Fonseca,Márcio Ribeiro,Marcos Kalinowski,Willy Tiengo,Juliana Alves Pereira

doi:10.1145/3439961.3439971

Abstract

Data scientists often develop machine learning models to solve a variety of problems in the industry and academy. To build these models, these professionals usually perform activities that are also performed in the traditional software development lifecycle, such as eliciting and implementing requirements. One might argue that data scientists could rely on the engineering of traditional software development to build machine learning models. However, machine learning development presents certain characteristics, which may raise challenges that lead to the need for adopting new practices. The literature lacks in characterizing this knowledge from the perspective of the data scientists. In this paper, we characterize challenges and practices addressing the engineering of machine learning models that deserve attention from the research community. To this end, we performed a qualitative study with eight data scientists across five different companies having different levels of experience in developing machine learning models. Our findings suggest that: (i) data processing and feature engineering are the most challenging stages in the development of machine learning models; (ii) it is essential synergy between data scientists and domain experts in most of stages; and (iii) the development of machine learning models lacks the support of a well-engineered process.

Full Text