Learning verb complements for Modern Greek: balancing the noisy dataset

Katia Kermanidis,Nikos Fakotakis,George Kokkinakis,Manolis Maragoudakis

doi:10.1017/s135132490600413x

Abstract

Attempting to automatically learn to identify verb complements from natural language corpora without the help of sophisticated linguistic resources like grammars, parsers or treebanks leads to a significant amount of noise in the data. In machine learning terms, where learning from examples is performed using class-labelled feature-value vectors, noise leads to an imbalanced set of vectors: assuming that the class label takes two values (in this work complement/non-complement), one class (complements) is heavily underrepresented in the data in comparison to the other. To overcome the drop in accuracy when predicting instances of the rare class due to this disproportion, we balance the learning data by applying one-sided sampling to the training corpus and thus by reducing the number of non-complement instances. This approach has been used in the past in several domains (image processing, medicine, etc) but not in natural language processing. For identifying the examples that are safe to remove, we use the value difference metric, which proves to be more suitable for nominal attributes like the ones this work deals with, unlike the Euclidean distance, which has been used traditionally in one-sided sampling. We experiment with different learning algorithms which have been widely used and their performance is well known to the machine learning community: Bayesian learners, instance-based learners and decision trees. Additionally we present and test a variation of Bayesian belief networks, the COr-BBN (Class-oriented Bayesian belief network). The performance improves up to 22% after balancing the dataset, reaching 73.7% f-measure for the complement class, having made use only a phrase chunker and basic morphological information for preprocessing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning verb complements for Modern Greek: balancing the noisy dataset

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering

Lead the way for us

Similar Papers

Unsupervised learning of probabilistic grammars
Kewei Tu
-
Kewei TuKewei Tu
31 Oct 2012
31 Oct 2012

Guest Editors Introduction: Machine Learning in Speech and Language Technologies
Pascale Fung ... Dan Roth
Machine Learning | VOL. 60
Pascale Fung, et. al.Pascale Fung ... Dan Roth
01 Sep 2005
Machine Learning | VOL. 60

Joint inference for natural language processing
Andrew Mccallum
-
Andrew MccallumAndrew Mccallum
01 Jan 2009
01 Jan 2009

Language Learning Research at the Intersection of Experimental, Computational, and Corpus‐Based Approaches
Patrick Rebuschat ... Detmar Meurers
Language Learning | VOL. 67
Patrick Rebuschat, et. al.Patrick Rebuschat ... Detmar Meurers
01 Jun 2017
Language Learning | VOL. 67

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning verb complements for Modern Greek: balancing the noisy dataset

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering