The analysis of the requirements for vacancies in the labor market shows that they are multi-level language constructions of several words with complex semantic relationships. The aim of the research is to develop a method for extracting short texts of knowledge and skills from texts of job requirements that have a complex organizational structure. The method consists in supplementing the structure of complex sentences with new relationships by means of a BERT neural network model trained on the texts of online vacancies and moving from a complex text to a set of simple word combinations. The process of additional training (finetunig) of BERT neural network models from the Sberbank AI laboratory on the texts of online vacancies is shown. Two mechanisms for adding new links between requirements words, taking into account knowledge from the subject area, are implemented: linear and through the addition of the parsing tree. In the course of the experiment, a comparative analysis was carried out for several combinations of the listed tools. The combination that showed the best result was 'the BERT + deeppavlov_syntax_parser model + a linear method of adding links'. The applicability of the method was demonstrated on the text corpus of online job requirements. The proposed method has shown higher efficiency than the rule-based approach, which involves the use of formal rules and grammar rules for natural language analysis. Using the method will allow you to quickly identify the key changes in the needs of the labor market at the level of requirements texts of individual knowledge and skills.
Read full abstract