Abstract

A hybrid methodology is proposed for extracting multiword expressions based on linguistic and statistical information. In the proposed methodology, N-grams are extracted by linguistic patterns and then various statistical measures are applied for classifying these N-grams as multiword expressions. To solve the problem of deciding cut-off boundary threshold in statistical filtering phase, a novel method for calculating boundary threshold is designed. Comparative analysis between the baseline method and the proposed methodology is presented. In the baseline method, firstly, N-grams are filtered by statistical measures and then linguistic filtering is applied. Precision, recall and ƒ-Score are calculated on manually annotated corpus. Observed results show that the proposed methodology provides good results for certain types of multiword expressions like compound nouns, verb-particles and verb-verb.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call