Feature selection in multiword expression recognition

Senem Kumova Metin

doi:10.1016/j.eswa.2017.09.047

Abstract

In multiword expression (MWE) recognition, there exist many studies where different learning methods are employed to decide whether given word combination is a multiword expression. The recognition methods commonly utilize a number of features that are extracted from a data source, frequently from the given text. Though the recognition methods and the features are well studied, we believe that to achieve the best possible performance with a learning method, different subsets of features should also be considered and the best performing subset must be selected.In this paper, we propose a procedure that covers the performance comparison of well-known feature selection methods to obtain the best feature subset in MWE recognition. The evaluation tests are performed on a Turkish MWE data set and the performance is measured by precision, recall and F1 values. The highest F1 value =0.731 is obtained by C4.5 classifier employing either wrapper or filtering method in feature selection. In the regarding setting(s), it is examined that the performance is increased by 1.11% compared to the setting where all features are employed in classification.Based on the experimental results, it may be stated that feature selection improves the performance of MWE recognition by eliminating the noisy/non-effective features. Moreover, it is obvious that proposed feature selection method contributes to the overall MWE recognition system by reducing the measurement and storage requirements due to the lower number of features in classification, providing a faster and more-cost effective learning model.

Full Text