Abstract

Taxonomies are the mainstay of the semantic web as they aim at organising knowledge in concepts linked by IS-A relationships. However, keeping such hierarchies updated and able to represent the domain from which they have been drawn is still a time-consuming, costly and error prone activity. Here, word embeddings have proven to be effective in catching lexicon and semantic similarities to enrich taxonomies from text data. This, in turn, would require to evaluate the generated embeddings to estimate the extent to which they encode the semantic similarity derived from the hierarchy itself. In this paper, we propose and implement MEET-LM, a methodology that aims at generating and evaluating embeddings from a text corpus preserving the co-hyponymy relations synthesised from a domain-specific taxonomy. We apply MEET-LM to a real-life dataset of 2M+ vacancies related to ICT-jobs, framed within the research activities of an EU project that collects millions of Online Job Vacancies and classifies them within the European standard hierarchy ESCO. To show MEET-LM is useful in practice, we also trained a neural network to classify co-hyponym relations using the selected embeddings as features. Our experiments reach 99.4% of accuracy and 86.5% of f1-score.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.