IranITJobs2021: a Dataset for Analyzing Iranian Online IT Job Advertisements Collected Using a New Crowdsourcing-based Dataset Gathering Process

Fakhroddin Noorbehbahani,Nikta Akbarpour,Mohammad Reza Saeidi

doi:10.1109/iccke57176.2022.9960084

Abstract

Gathering and preparing high-quality data is one of the most significant and expensive steps in data analytics. Crowdsourcing is an efficient way to create datasets for machine learning and data science applications. However, it is vital to apply a proper crowdsourcing process for dataset creation to ensure the quality of the collected data. In this paper, a new process to create high-quality datasets based on crowdsourcing is proposed, including the pre-gathering, gathering, and post-gathering phases. Today employers and job seekers benefit from online job postings and social media sites for recruitment more than ever before. Consequently, a huge volume of job posting data is available that enforces the need for data visualization and data analytics for extracting valuable insights to help better decision making. Although there exist several online job advertisement datasets for analyzing job demand and requirements, there is no such dataset about the IT job market in Iran. In this paper, IranITJobs2021, an online IT job posting dataset, is presented, which is produced using the proposed dataset gathering process. IranITJobs2021 includes job advertisements related to information technology from August 2019 to January 2021. The dataset incorporates 1300 instances and 13 features which is publicly available. IranITJobs2021 could be analyzed to find valuable patterns of job requirements and skills in the field of information technology. Furthermore, the proposed dataset gathering process is applicable to create datasets efficiently.

Full Text