A comparison of algorithms for maximum entropy parameter estimation

Robert Malouf

doi:10.3115/1118853.1118871

Robert Malouf

Open Access

PDF Available

https://doi.org/10.3115/1118853.1118871

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2002

Citations: 568

Affiliation: Alfa College

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of classification problems in natural language processing. However, the flexibility of ME models is not without cost. While parameter estimation for ME models is conceptually straightforward, in practice ME models for typical natural language tasks are very large, and may well contain many thousands of free parameters. In this paper, we consider a number of algorithms for estimating the parameters of ME models, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods. Sur-prisingly, the standardly used iterative scaling algorithms perform quite poorly in comparison to the others, and for all of the test problems, a limited-memory variable metric algorithm outperformed the other choices.

Full Text