Automatic subject heading assignment for online government publications using a semi‐supervised machine learning approach

Xiao Hu,Jing Zhang,Larry S Jackson,Sai Deng

doi:10.1002/meet.14504201139

Abstract

AbstractAs the dramatic expansion of online publications continues, state libraries urgently need effective tools to organize and archive the huge number of government documents published online. Automatic text categorization techniques can be applied to classify documents approximately, given a sufficient number of labeled training examples. However, obtaining training labels is very expensive, requiring a lot of manual labor. We present a real world online government information preservation project (PEP) in the State of Illinois, and a semi‐supervised machine learning approach, an Expectation‐Maximization (EM) algorithm‐based text classifier, which is applied to automatically assign subject headings to documents harvested in the PEP project. The EM classifier makes use of easily obtained unlabeled documents and thus reduces the demand for labeled training examples. This paper describes both the context and the procedure of such an application. Experiment results are reported and other alternative approaches are also discussed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic subject heading assignment for online government publications using a semi‐supervised machine learning approach

Abstract

Talk to us

Similar Papers

More From: Proceedings of the American Society for Information Science and Technology

Lead the way for us

Journal: Proceedings of the American Society for Information Science and Technology	Publication Date: Jan 1, 2005
Citations: 4

Similar Papers

Learning Markov logic networks with limited number of labeled training examples
Tak-Lam Wong
International Journal of Knowledge-based and Intelligent Engineering Systems | VOL. 18
Tak-Lam WongTak-Lam Wong
08 Apr 2014
International Journal of Knowledge-based and Intelligent Engineering Systems | VOL. 18

Incremental support vector machine for unlabeled data classification
Jinhyuk Hong ... Sung-Bae Cho
-
Jinhyuk Hong, et. al. Jinhyuk Hong ... Sung-Bae Cho
18 Nov 2002
18 Nov 2002

An active MBBNTree classifier learning from unlabeled samples
Yong C Cao ... Yue Zhao
-
Yong C Cao, et. al.Yong C Cao ... Yue Zhao
10 Oct 2008
10 Oct 2008

A Two Step Data Mining Approach for Amharic Text Classification
...
-
, et. al. ...
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic subject heading assignment for online government publications using a semi‐supervised machine learning approach

Abstract

Talk to us

Similar Papers

More From: Proceedings of the American Society for Information Science and Technology