A measure of term representativeness based on the number of co-occurring salient words

Yoshiki Niwa,Toru Hisamitsu

doi:10.3115/1072228.1072353

A measure of term representativeness based on the number of co-occurring salient words

Yoshiki Niwa, Toru Hisamitsu

Open Access

https://doi.org/10.3115/1072228.1072353

Copy DOI

Publication Date: Jan 1, 2002

Citations: 25

Affiliation: Hitachi (Japan)

#Number Of Distinct Words #Co-occurring Words + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We propose a novel measure of the representativeness (i.e., indicativeness or topic specificity) of a term in a given corpus. The measure embodies the idea that the distribution of words co-occurring with a representative term should be biased according to the word distribution in the whole corpus. The bias of the word distribution in the co-occurring words is defined as the number of distinct words whose occurrences are saliently biased in the co-occurring words. The saliency of a word is defined by a threshold probability that can be automatically defined using the whole corpus. Comparative evaluation clarified that the measure is clearly superior to conventional measures in finding topic-specific words in the newspaper archives of different sizes.

Full Text