Abstract

AbstractA Korean-Japanese-Chinese aligned wordnet, “CoreNet” is introduced. For the purpose of this paper, the term “wordnet” refers to a network of words. It is constructed based on a shared semantic hierarchy that is originated from NTT Goidaikei (Lexical Hierarchical System). Korean wordnet was constructed through the semantic category assignment to every meaning of Korean words in a dictionary. Verbs and adjectives’ word senses are assigned to the same semantic hierarchy as that of nouns. Each sense of verbs is investigated from corpora for their usage, and compared with Japanese translation. Chinese wordnet with the same semantic hierarchy was built up based on the comparison with Korean wordnet. Each sense of Chinese verb corresponds to Korean with its argument structure. The use of the same semantic hierarchy for nouns and predicates has several advantages. First, the surface forms of nouns and predicates share the similar one, especially in Chinese words. In case of Korean and Japanese, the typical formation is like “do+Noun” in English like “Noun+suru” in Japanese and “Noun+hada” in Korean. Second, the language generation from conceptual structures takes freedom to choose the surface form whether it chooses noun phrases or verb phrases. CoreNet has been constructed by the following principles: word sense mapping to concept, corpus-based, multi-lingualism, and single concept system for multi-languages. The overall flow of construction is based on dictionary-based bootstrapping, incremental similarity-based classification and manual post-editing. Among consideration points, the followings are introduced: multiple concept mapping, verbal noun, and concept splitting. For multiple concept mapping, a word is mapped into numerous concepts that comprise respective meanings of the word. For example, school is an “institution for the instruction of students.” The word school is mapped into three concepts such as location, organization, and facility. For verbal noun, a word that is a verb is assigned to concepts after it is transformed to a noun. For example, “write” is transformed to its noun form “writing” that is mapped into a concept writing falling under event. For concept splitting, every time inconsistency among nodes of concepts is discovered, a node may be added. What differs between CoreNet and NTT Goidaikei is that CoreNet features mapping between word senses (not just words) and concepts. These works have lasted since 1994.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.