Abstract

This paper proposes a methodology that uses a large-scale employment dataset in order to explore which factors affect employment and how. The proposed methodology is a combination of predictive modelling, variable significance analysis, and VEC analysis. Modelling is based on logistic regression, linear discriminant analysis, neural network, classification tree, and support vector machine. Following the CRISP-DM standard process model, we train binary classifiers optimising their hyper-parameters and measure their performance by prediction accuracy, ROC analysis, and AUC. Using sensitivity analysis, we rank the variable significance in order to identify and measure factors of employment. Using VEC analysis, we further explore how values of those factors affect employment. Findings show that best performing models are neural networks and support vector machines with preference to the latter for quality of VEC. Experiments also suggest that education and age are primary contributors for correct classification with specific value distribution, discussed in the paper. All results were validated using a rigorous testing procedure that involves training, validation, and test data partitions and a combination of multiple runs along with three-fold cross-validation. This study addresses some gaps in previous research publications, which lack quantification of the conclusions made.

Highlights

  • In recent years, analysing large or big-data sources has become focus to many studies related to data mining and knowledge discovery

  • Modelling Techniques With reference to the CRISP-DM modelling stage, this study considers five binary classification algorithms: Logistic regression, linear discriminant analysis, neural networks, classification trees, and support vector machines, each outlined below briefly

  • We address some gaps in previous research, which lacks quantification of conclusions made

Read more

Summary

Introduction

In recent years, analysing large or big-data sources has become focus to many studies related to data mining and knowledge discovery. Knowledge obtained discloses relationships between factors associated with employment and recognises their role. The tools and methodologies used in that analysis become a valuable mean for empirical validation of hypotheses and theoretical considerations in that domain. This study aims to analyse data form a large-scale nationwide survey of households in Ireland in order to identify empirically employment factors and to find how their values impact on employment. A major component of this analysis is building machine learning classification models that fit the data. Classification is one of the most prominent and effective supervised learning methods, which allows to explore the role of demographic characteristics, education

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call