Abstract

In the recruitment domain, knowing the employer industry of jobs is important to get an insight about the demand in each industry. The existing system at CareerBuilder uses an employer name normalization system and an employer knowledge base (KB) to infer the employer industry of a job. However, errors may occur during the computation of the job employer and in the construction of the employer KB with the industry attributes. Since the KB is huge, it is not possible to manually detect the errors. Therefore, in this paper we use machine learning techniques to automatically detect the errors. With the observation that the main jobs posted by an employer often relate to the employer industry, e.g., truck driver jobs often correspond to employers in the transportation industry, we develop a system that classifies the industry of an employer using job posting data. We aggregate job postings from an employer and derive features from employer names, employer descriptions, job titles, and job descriptions to predict the industry of the employer. Two models are used for classification: (1) support vector machine and (2) random forest. Our experiments show that random forest is more effective than SVM in identifying the errors in the existing industry classification system, which achieves precision 0.69, recall 0.78, and f-score 0.73. It especially better handles mixed feature vectors when normalization errors occur. We also observe that generally our models perform better in detecting errors for industries that have higher error rates.

Highlights

  • Different employers belong to different industries such as Transportation and Warehousing (Transportation), and Health Care and Social Assistance (Health Care), etc., according to the North American Industry ClassificationSystem (NAICS)1

  • We observe that random forest is more effective than support vector machine (SVM) in identifying the errors in the existing industry classification system, which achieves precision 0.69, recall 0.78, and f-score 0.73. It especially better captures the interactions between different signals in more complex and mixed feature vectors, when normalization errors occur

  • It can be seen that SVM detected a larger ratio of knowledge base (KB) errors, whereas random forest detected a larger ratio of normalization errors

Read more

Summary

Introduction

Different employers belong to different industries such as Transportation and Warehousing (Transportation), and Health Care and Social Assistance (Health Care), etc., according to the North American Industry ClassificationSystem (NAICS). Knowing the industry of an employer helps to get an insight about the demand in each industry, such as the number of jobs and top job posters. This can be useful for labor market analysis since we can know which industries are important and provide more jobs. The existing system at CareerBuilder uses an employer name normalization system [14,15,16] and an employer knowledge base (KB) to infer the employer industry of a job. The employer KB contains an industry attribute for each employer entity. Errors may occur during the computation of the job employer and in the construction of the employer KB with the industry attributes.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.