Compute the Term Contributed Frequency

Cheng-Lung Sung,Wen-Lian Hsu,Hsu-Chun Yen

doi:10.1109/isda.2008.152

Compute the Term Contributed Frequency

Cheng-Lung Sung, Wen-Lian Hsu + Show 1 more

https://doi.org/10.1109/isda.2008.152

Copy DOI

Publication Date: Nov 1, 2008

Citations: 16

Affiliation: Institute of Information Science, Academia Sinica, National Taiwan University

#Proposed Data Structure #Corpus-Based Natural Language Processing + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In this paper, we propose an algorithm and data structure for computing the term contributed frequency (tcf) for all N-grams in a text corpus. Although term frequency is one of the standard notions of frequency in Corpus-Based Natural Language Processing (NLP), there are some problems regarding the use of the concept to N-grams approaches such as the distortion of phrase frequencies. We attempt to overcome this drawback by building a DAG containing the proposed data structure and using it to retrieve more reliable term frequencies. Our proposed algorithm and data structure are more efficient than traditional term frequency extraction approaches and portable to various languages.

Full Text