Abstract

Measuring semantic similarity is essential to many natural language processing (NLP) tasks. One widely used method to evaluate the similarity calculating models is to test their consistency with humans using human-scored gold-standard datasets, which consist of word pairs with corresponding similarity scores judged by human subjects. However, the descriptions on how such datasets are constructed are often not sufficient previously. Many problems, e.g. how the word pairs are selected, whether or not the scores are reasonable, etc., are not clearly addressed. In this paper, we proposed a multidisciplinary method for building and validating semantic similarity standard datasets, which is composed of 3 steps. Firstly, word pairs are selected based on computational linguistic resources. Secondly, similarities for the selected word pairs are scored by human subjects. Finally, Event-Related Potentials (ERPs) experiments are conducted to test the soundness of the constructed dataset. Using the proposed method, we finally constructed a Chinese gold-standard word similarity dataset with 260 word pairs and validated its soundness via ERP experiments. Although the paper only focused on constructing Chinese standard dataset, the proposed method is applicable to other languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.