Abstract

Human-scored word similarity gold-standard datasets are normally composed of word pairs with corresponding similarity scores. These datasets are popular resources for evaluating word similarity models which are the essential components for many natural language processing tasks. This paper proposes a novel multidisciplinary method for constructing and validating word similarity gold-standard datasets. The proposed method is different from the previous ones in that it introduces methods from three different disciplines, i.e., psychology, brain science and computational linguistics to validate the soundness of the constructed datasets. Specifically, to the best of our knowledge, this is the first time event-related potentials experiments are incorporated to validate the word similarity datasets. Using the proposed method, we finally constructed a Chinese gold-standard word similarity dataset with 260 word pairs and showed its soundness using the interdisciplinary validating methods. It should be noted that, although the paper only focused on constructing Chinese standard dataset, the proposed method is applicable to other languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.