Abstract

Abstract In this study, we propose a new evaluation scheme to assess the strengths and limitations of collocation extraction measures and explore type-sensitive methods for extracting collocations. We introduced the pooling strategy widely used in Information Retrieval and automated the evaluation process using online dictionaries. Sixteen well-known metrics are evaluated based on their effectiveness and then distributional and linguistic compared. The results show that Group A methods (e.g. z-score, Dice, PMI) are more effective in extracting low-frequency collocations with relatively small extraction scales. In contrast, Group B methods (e.g. t-test, LMI, LLR) perform better at finding high-frequency collocations, most of which outperform Group A methods as the extraction scale increases. Moreover, Group A prefers NN collocations, while Group B identifies collocations with a wide range of syntactic structures. This study provides suggestions for studies to identify hybrid extraction methods as well as for language educators and dictionary compilers.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.