Abstract

In order to realize friendly man-machine communication, machines must understand not only surface expressions of human utterance but also deep meanings of human behavior. We started compilation of “paraphrase representation list of compound verbs” as the first step of investigation and standardization of lexical items which is a part of “control language for action”. We processed the corpus and vectorized the data by using Word2Vec. Using the created vector, we performed a calculation of similarity between the compound verbs and verbs in a corpus by cosine similarity, and created a paraphrase representation list. We got paraphrase expressions for 1899 compound verbs among 3289 compound verbs (including orthographic variants) stored in the compound verb lexicon. We found by this method words which do not exist in the Japanese WordNet. We investigated the words that exist only in the result of automatic extraction, and found that there are 213 unknown words and 227 new synonymous relationship. What is worthy of special mention is that there is 14 differences between the unknown word and a new synonymous relationship, which means we could find 14 words which are stored in the Japanese WordNet, but are not considered as synonyms of a word. We can say that the proposed method is useful for the expansion of paraphrase relationship listed by human intuitions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.