Abstract

This paper proposes two natural language processing models for extracting useful information from multilingual, unstructured (free form) CV documents. The model identifies the relevant document sections (personal information, education, employment, etc.) and the corresponding specific information at the lower hierarchy level (names, addresses, roles, skill competences, etc.). Our approach employs the transformer architecture and its multilingual implementation of the encoder part in the form of the BERT language model. The models are trained and tested on a large, manually annotated CV dataset, achieving high scores on standard accuracy measures. The proposed models exhibit important properties of end-to-end training and interpretability, which was investigated by visualizing the model attention and its vector representations.

Highlights

  • Automatic extraction of useful information from CVs given in free form is a difficult task in the area of natural language processing (NLP)

  • In our work, machine learning techniques are used in the context of NLP in order to achieve a high degree of accuracy in extracting the desired information in arbitrary format in five languages

  • This paper proposed a new architecture for processing sequential inputs using transformer, and the implementation of its encoder part in the form of the Bidirectional Encoder Representations from Transformers (BERT) language model

Read more

Summary

INTRODUCTION

Automatic extraction of useful information from CVs given in free form is a difficult task in the area of natural language processing (NLP). A system which could convert a free-form CV into a given highly organized structure can be a very valuable tool to recruiters and various job market websites Useful information in this case includes personal information such as first and last name, residential addresses and spoken language, as well as information about past employments, education and skills or competences of the person. D. Vukadin et al.: Information Extraction from Free-Form CV Documents in Multiple Languages cision, recall and F1 scores on a dataset consisting of 1686 annotated CVs in five languages: English, Swedish, Norwegian, Finnish and Polish.

RELATED WORK
EVALUATION
RESULTS FOR DUAL MODEL
Numb6er of BERT 8layers 10
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call