Occupation-specific job tasks (OSTs) refer to the duties, responsibilities, and activities associated with a particular occupation, which define the core functions and performance expectations for those engaged in that profession. Efficient recognition and extraction of OSTs from large-scale job description data are essential for establishing a continually updated occupational information system (OIS), such as O*NET, which serves as critical tools for advancing research in work and labor markets. However, this task presents substantial challenges due to its heavy reliance on domain experts for the labor-intensive annotation of job postings, rendering the process time-consuming and difficult to scale for large-scale implementation. To this end, in this paper, we present COTR , a novel data-driven framework designed for the efficient recognition of OSTs from job postings, capable of continually identifying new tasks through class-incremental learning. Specifically, we first employ large language models (LLMs) and prompt learning to develop a three-phase process—“expansion, translation, and generation”—that addresses the critical challenge of the absence of predefined OSTs in non-English labor market data, leveraging O*NET as a foundational reference. Subsequently, we introduce a BERT-based model for OST recognition, incorporating a uniquely designed pair-wise loss function that distills valuable insights from ChatGPT or other LLMs, thereby substantially enhancing recognition performance. In addition, to achieve cost-effective training data annotation, we develop an LLM-based coarse-to-fine candidate OSTs generation algorithm, integrating contrastive active learning to optimize the annotation process through human-machine collaboration. Notably, we design a supervised fine-tuning strategy with a novel encoding technique to optimize LLMs, improving the recall rate of the generated candidate OSTs and achieving up to a 343-fold increase in annotation efficiency compared to traditional manual expert annotation in our experiments. Afterward, we propose an efficient class-incremental learning method that incorporates an out-of-distribution (OOD) detection module for identifying potential novel OSTs and a fine-tuning module to extend the model’s recognition capabilities to include newly discovered tasks. Finally, we construct two real-world datasets using job posting data collected from the labor markets of China and the United States, respectively. Extensive experiments on the real-world datasets, along with two publicly available datasets, have demonstrated the effectiveness of the proposed COTR. Furthermore, several case studies showcase the significant benefits of COTR for various downstream applications in labor market analysis, including analyzing the evolving demand for OSTs, assessing the value of OSTs, and recognizing the relationships between OSTs and associated skills.
Read full abstract