Abstract

The technique to identify and extract collocations in a given sentence is very important to sentence understanding, analysing and translating. So we propose a sentence level collocation identification and extraction method which follows the traditional two phase collocation extraction model. In candidate generating phase, we use the dependency parsing results directly, while in the filtering phase, we propose to use the latest model of distributional semantics - word embedding based similarity to filter the noises. For each candidate, three word embedding based similarity rankings will be obtained and accordingly to decide if it is a real collocation. The experimental results show that the proposed filtering method performs better than the traditional well-known association measures. The comparison with the baseline system shows that the proposed method can retrieve more collocations with higher precision than the baseline, which is of significance to sentence related natural language processing tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call