Incorporating noun compounds in distributional-based semantic representation approaches for measuring semantic relatedness

Ummi Zakiah Zainodin,Nazlia Omar,Abdulgabbar Saif

doi:10.1504/ijris.2019.10019392

Abstract

Identifying noun compounds in natural language documents is very important for handling their various linguistic features, such as semantic, syntactic, and pragmatic features. In this study, we introduce a knowledge-based method for incorporating noun compounds in distributional-based semantic representation approaches. Wikipedia is exploited as a knowledge resource for extracting noun compounds based on its structural features. The categories are then used to classify the extracted noun compounds as linguistic terms and named entities. Next, the look-up list technique is employed to identify the noun compounds when extracting the semantics of the terms using the corpus-based approach for semantic representation. To obtain the semantic representation, we use five well-known distributional-based approaches: latent semantic analysis (LSA), hyperspace analogue to language (HAL), correlated occurrence analogue to lexical semantic (COALS), bound encoding of the aggregate language environment (BEAGLE), and explicit semantic analysis (ESA). The proposed method was evaluated by measuring the semantic relatedness using five benchmark datasets employed in previous studies. The experimental results demonstrate that incorporating noun compounds in the distributional-based semantic representation helps to improve the semantic evidence for the relationships among words.

Full Text