Cost Evaluation of CRF-Based Bibliography Extraction from Reference Strings

Naomichi Kawakami,Manabu Ohta,Jun Adachi,Atsuhiro Takasu

doi:10.1007/978-3-319-12823-8_28

Cost Evaluation of CRF-Based Bibliography Extraction from Reference Strings

Naomichi Kawakami, Manabu Ohta + Show 2 more

https://doi.org/10.1007/978-3-319-12823-8_28

Copy DOI

Publication Date: Jan 1, 2014

Citations: 9

Affiliation: Okayama University, National Institute of Informatics

#Reference Strings #Conditional Random Field + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The effective use of digital libraries demands maintenance of bibliographic databases. Especially, the reference fields of academic papers are full of useful bibliographic information such as authors’ names and paper titles. We, therefore, propose a method of automatically extracting bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary for training the CRF to achieve high extraction accuracies. As described herein, we propose the use of active sampling and pseudo-training data to reduce the amount of training data. Then we evaluate the associated training costs by experimentation.

Full Text