Abstract
Companies always seek ways to make their professional employees stay with them to reduce extra recruiting and training costs. Predicting whether a particular employee may leave or not will help the company to make preventive decisions. Unlike physical systems, human resource problems cannot be described by a scientific-analytical formula. Therefore, machine learning approaches are the best tools for this aim. This paper presents a three-stage (pre-processing, processing, post-processing) framework for attrition prediction. An IBM HR dataset is chosen as the case study. Since there are several features in the dataset, the “max-out” feature selection method is proposed for dimension reduction in the pre-processing stage. This method is implemented for the IBM HR dataset. The coefficient of each feature in the logistic regression model shows the importance of the feature in attrition prediction. The results show improvement in the F1-score performance measure due to the “max-out” feature selection method. Finally, the validity of parameters is checked by training the model for multiple bootstrap datasets. Then, the average and standard deviation of parameters are analyzed to check the confidence value of the model’s parameters and their stability. The small standard deviation of parameters indicates that the model is stable and is more likely to generalize well.
Highlights
Human resource is the initial source and the most critical essence of each company
Machine learning approaches are the best tools for this aim
Since there are several features in the dataset, the “max-out” feature selection method is proposed for dimension reduction in the pre-processing stage
Summary
Human resource is the initial source and the most critical essence of each company. Managers spend a considerable amount of time recruiting capable employees. In the second case, the employer faces delays in its project schedule, due to recruiting and training the replacement employee Predicting attrition makes it easier for decision-makers to take proper preventive actions. The precision, recall, and F1-score values reveal that Logistic Regression performed well in the attrition prediction task, and some indicators of this model are higher than those of other classifiers. The paper computed recall and precision measures based on the class of employees who stayed with the company. This class is more populated, and thereby the performance of the classifier is over-estimated. Authors in [9] tried several classifiers, including logistic regression, AdaBoost, random forest, and gradient boosting for attrition prediction.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have