Automatically Detecting Errors in Employer Industry Classification Using Job Postings

Alan Chern,Qiaoling Liu,Faizan Javed,Mahak Goindani,Josh Chao

doi:10.1007/s41019-018-0071-7

Alan Chern, Qiaoling Liu + Show 3 more

Open Access

https://doi.org/10.1007/s41019-018-0071-7

Copy DOI

Journal: Data Science and Engineering	Publication Date: Aug 19, 2018
Citations: 5	License type: open-access

Affiliation: Purdue University West Lafayette

Abstract

In the recruitment domain, knowing the employer industry of jobs is important to get an insight about the demand in each industry. The existing system at CareerBuilder uses an employer name normalization system and an employer knowledge base (KB) to infer the employer industry of a job. However, errors may occur during the computation of the job employer and in the construction of the employer KB with the industry attributes. Since the KB is huge, it is not possible to manually detect the errors. Therefore, in this paper we use machine learning techniques to automatically detect the errors. With the observation that the main jobs posted by an employer often relate to the employer industry, e.g., truck driver jobs often correspond to employers in the transportation industry, we develop a system that classifies the industry of an employer using job posting data. We aggregate job postings from an employer and derive features from employer names, employer descriptions, job titles, and job descriptions to predict the industry of the employer. Two models are used for classification: (1) support vector machine and (2) random forest. Our experiments show that random forest is more effective than SVM in identifying the errors in the existing industry classification system, which achieves precision 0.69, recall 0.78, and f-score 0.73. It especially better handles mixed feature vectors when normalization errors occur. We also observe that generally our models perform better in detecting errors for industries that have higher error rates.

Highlights

Different employers belong to different industries such as Transportation and Warehousing (Transportation), and Health Care and Social Assistance (Health Care), etc., according to the North American Industry ClassificationSystem (NAICS)1
We observe that random forest is more effective than support vector machine (SVM) in identifying the errors in the existing industry classification system, which achieves precision 0.69, recall 0.78, and f-score 0.73. It especially better captures the interactions between different signals in more complex and mixed feature vectors, when normalization errors occur
It can be seen that SVM detected a larger ratio of knowledge base (KB) errors, whereas random forest detected a larger ratio of normalization errors

Summary

Introduction

Different employers belong to different industries such as Transportation and Warehousing (Transportation), and Health Care and Social Assistance (Health Care), etc., according to the North American Industry ClassificationSystem (NAICS). Knowing the industry of an employer helps to get an insight about the demand in each industry, such as the number of jobs and top job posters. This can be useful for labor market analysis since we can know which industries are important and provide more jobs. The existing system at CareerBuilder uses an employer name normalization system [14,15,16] and an employer knowledge base (KB) to infer the employer industry of a job. The employer KB contains an industry attribute for each employer entity. Errors may occur during the computation of the job employer and in the construction of the employer KB with the industry attributes.

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatically Detecting Errors in Employer Industry Classification Using Job Postings

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science and Engineering

Lead the way for us

Similar Papers

Employer Industry Classification Using Job Postings
Mahak Goindani ... Josh Chao
-
Mahak Goindani, et. al.Mahak Goindani ... Josh Chao
01 Nov 2017
01 Nov 2017

COVID-19, jobs and skills-Exploratory analysis of the job postings in the US and UK healthcare job market.
Himanshu Upadhyay ... Maher Maalouf
PLOS ONE | VOL. 18
Himanshu Upadhyay, et. al.Himanshu Upadhyay ... Maher Maalouf
20 Jan 2023
PLOS ONE | VOL. 18

Assessment Related Skills and Knowledge Are Increasingly Mentioned in Library Job Postings
Carol Perryman
Evidence Based Library and Information Practice | VOL. 10
Carol PerrymanCarol Perryman
06 Mar 2015
Evidence Based Library and Information Practice | VOL. 10

Is Working from Home Here to Stay? Evidence from Job Posting Data after the COVID-19 Shock
Jiayin Hu ... Yang Yao
SSRN Electronic Journal | VOL. -
Jiayin Hu, et. al.Jiayin Hu ... Yang Yao
01 Jan 2020
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatically Detecting Errors in Employer Industry Classification Using Job Postings

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science and Engineering