Abstract

The continuous growth in the online recruitment industry has made the candidate screening process costly, labour intensive, and time-consuming. Automating the screening process would expedite candidate selection. In recent times, recruiting is moving towards skill-based recruitment where candidates are ranked according to the number of skills, skill’s competence level and skill’s experience. Therefore it is important to create a system which can accurately and automatically extract hard and soft skills from candidates’ resume and job descriptions. The task is less complex for hard skills which in some cases could be named entities but much more challenging for soft skills which may appear in different linguistic forms depending on the context. In this paper, we propose a context-aware sequence classification and token classification model for extracting both hard and soft skills. We utilized the most recent state-of-the-art word embedding representations as textual features for various machine learning classifiers. The models have been validated by evaluating them on a publicly available job description dataset. Our results indicated that the best performing sequence classification model used BERT embeddings in addition with POS and DEP tags as input for a logistic regression classifier. The best performing token classification model used fine-tuned BERT embeddings with a support vector machine classifier.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call