Abstract

For decades, traditional correlation analysis and regression models have been used in social science research. However, the development of machine learning algorithms makes it possible to apply machine learning techniques for social science research and social issues, which may outperform standard regression methods in some cases. Under the circumstances, this article proposes a methodological workflow for data analysis by machine learning techniques that have the possibility to be widely applied in social issues. Specifically, the workflow tries to uncover the natural mechanisms behind the social issues through a data-driven perspective from feature selection to model building. The advantage of data-driven techniques in feature selection is that the workflow can be built without so much restriction of related knowledge and theory in social science. The advantage of using machine learning techniques in modelling is to uncover non-linear and complex relationships behind social issues. The main purpose of our methodological workflow is to find important fields relevant to the target and provide appropriate predictions. However, to explain the result still needs theory and knowledge from social science. In this paper, we trained a methodological workflow with left-behind children as the social issue case, and all steps and full results are included.

Highlights

  • Traditional data analysis methodology has encountered some challenges in understanding the underlying complex mechanism of social issues

  • This paper proposes a general data-driven methodological workflow, which can be widely used for different social issues, to overcome the weakness of traditional methods on feature selection and capture the non-linearity using machine learning techniques and methods

  • Since Wrapper Approach reconstructs the model for each iteration, a large number of models can be built during the process

Read more

Summary

Introduction

Traditional data analysis methodology has encountered some challenges in understanding the underlying complex mechanism of social issues. With the growing data volume, it becomes harder to quickly select features manually based on prior knowledge, domain expert experience, and literature review. Traditional methods cannot capture complex non-linear relationships underlying the explanatory variables and target variables as they usually assume a linear relationship between variables. Even with large datasets, traditional data analysis methods have difficulty in making full use of the rich information timely compared with machine learning techniques.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.