The investigation and prediction for salary trends in the data science industry

Wentao Jiang

doi:10.54254/2755-2721/50/20241102

Abstract

The aim of this study is to utilize machine learning techniques to analyze salary trends within the data science industry spanning the last three years. Initially, this study presented an overview of four machine learning models: Random Forests, eXtreme Gradient Boosting (XGBoost), Neural Networks, and Support Vector Regression (SVR), elucidating their fundamental principles and characteristics. Subsequently, this study gathered, preprocessed, and engaged in feature engineering with salary data from the data science sector over the past three years. These four machine learning models are then employed for salary prediction, and the ensuing model outcomes are meticulously examined. By conducting a comparative analysis and evaluating each models performance, their respective strengths and weaknesses were identified. In conclusion, this study summarized the studys findings and deliberated on potential future research directions. The innovation inherent in this research lies in the application of diverse machine learning models to forecast salaries within the data science industry, coupled with the comprehensive comparison and evaluation of these models. The main conclusion is that XGBoost performs best in salary prediction, while neural networks are more accurate and complex, and SVR has limited applicability. Future research prospects include improving the accuracy and interpretability of models, exploring more features and data processing methods to enhance the accuracy and practicality of salary prediction in the data science industry.

Full Text