Automatically Extracting Academic Papers from Web Pages Using Conditional Random Fields Model

Wei Liu,Jianxun Zeng

doi:10.4304/jsw.6.8.1409-1416

Automatically Extracting Academic Papers from Web Pages Using Conditional Random Fields Model

Wei Liu, Jianxun Zeng

https://doi.org/10.4304/jsw.6.8.1409-1416

Copy DOI

Journal: Journal of Software	Publication Date: Nov 8, 2011
Citations: 3

Affiliation: Institute of Scientific and Technical Information of China

#Conditional Random Fields Model #Academic Papers + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

A huge amount of academic papers(including research reports) are being released in web pages. It is important to extract these papers in a structured way for many popular applications, such as science and technology information retrieval and digital library. However, few investigations have been done on the issue of academic paper extraction. This paper proposed a unified approach for automatically extracting academic papers from web pages based on CRF model. In the proposed approach, both academic paper extraction and semantic labeling are performed simultaneously by employing the theoretical Conditional Random Fields(CRF) model. Experimental results show that our approach can achieve significantly better extraction results.

Full Text