Controlling Complexity in Part-of-Speech Induction

J V Graca,B Taskar,L Coheur,F Pereira,K Ganchev

doi:10.1613/jair.3348

Abstract

We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and large model capacity. We address this problem by refining the model and modifying the learning objective to control its capacity via para- metric and non-parametric constraints. Our approach enforces word-category association sparsity, adds morphological and orthographic features, and eliminates hard-to-estimate parameters for rare words. We develop an efficient learning algorithm that is not much more computationally intensive than standard training. We also provide an open-source implementation of the algorithm. Our experiments on five diverse languages (Bulgarian, Danish, English, Portuguese, Spanish) achieve significant improvements compared with previous methods for the same task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Artificial Intelligence Research	Publication Date: Aug 30, 2011
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Controlling Complexity in Part-of-Speech Induction

Abstract

Talk to us

Similar Papers

More From: Journal of Artificial Intelligence Research

Lead the way for us

Similar Papers

Controlling complexity in part-of-speech induction
...
Journal of Artificial Intelligence Research | VOL. 41
, et. al. ...
30 Aug 2011
Journal of Artificial Intelligence Research | VOL. 41

A SYSTEMATIC REVIEW TO ASSESS THE EFFECTIVENESS OF WEB BASED TRAINING VIDEO FOR LAPAROSCOPY SURGERY
Alfred Egedovo ... Sarah Larkins
International Journal of Research -GRANTHAALAYAH | VOL. 5
Alfred Egedovo, et. al.Alfred Egedovo ... Sarah Larkins
31 Oct 2017
International Journal of Research -GRANTHAALAYAH | VOL. 5

Chapter 2 - A Modular Approach to Grammatical Categories Evidence from Language Diversity and Contact
Pieter Muysken
Handbook of Categorization in Cognitive Science | VOL. -
Pieter MuyskenPieter Muysken
01 Jan 2004
Handbook of Categorization in Cognitive Science | VOL. -

The Student Nurses' Written Works of Health Science Institute: Error Analysis in Syntactical and Morphological Category
Somariah Fitriani
Jurnal Penelitian Humaniora | VOL. 19
Somariah FitrianiSomariah Fitriani
01 Feb 2018
Jurnal Penelitian Humaniora | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Controlling Complexity in Part-of-Speech Induction

Abstract

Talk to us

Similar Papers

More From: Journal of Artificial Intelligence Research