Data science and machine learning are subjects largely debated in practice and in mainstream research. Very often, they are overlapping due to their common purpose: prediction. Therefore, data science techniques mix with machine earning techniques in their mutual attempt to gain insights from data. Data contains multiple possible predictors, not necessarily structured, and it becomes difficult to extract insights. Identifying important or relevant features that can help improve the prediction power or to better characterize clusters of data is still debated in the scientific literature. This article uses diverse data science and machine learning techniques to identify the most relevant aspects which differentiate data science and machine learning. We used a publicly available dataset that describes multiple users who work in the field of data engineering. Among them, we selected data scientists and machine learning engineers and analyzed the resulting dataset. We designed the feature engineering process and identified the specific differences in terms of features that best describe data scientists and machine learning engineers by using the SelectKBest algorithm, neural networks, random forest classifier, support vector classifier, cluster analysis, and self-organizing maps. We validated our model through different statistics. Better insights lead to better classification. Classifying between data scientists and machine learning engineers proved to be more accurate after features engineering.
Read full abstract