Abstract
AbstractTo date, online job portals are the main media to connect job seekers and employers. These platforms offer job catalogues and search functionality where one can search for jobs that alphabetically match specific keywords. This setup has limitations in terms of retrieval and accuracy since keyword matching suffers from inconsistent representations of meaning such as typo, slang, abbreviations, and synonyms. Thus, to enable advanced search, it is needed to have a machine learning model that can automatically detect occupational skill synonyms. In this work, we propose a rational process to construct a practical labeled Vietnamese Skill Synonym (ViSki) dataset. We experiment with 2 approaches for the synonym prediction task: cosine similarity-based and classification-based. Our best model, XGBoost with LaBSE embedding, achieved 96.15% accuracy and an F1-score of 0.9285.KeywordsSkill synonymWord embeddingXGBoostSiamese network
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have