TOC Structure Extraction from OCR-ed Books

Caihua Liu,Yalou Huang,Jie Liu,Xiaofeng Zhang,Jiajun Chen

doi:10.1007/978-3-642-35734-3_8

TOC Structure Extraction from OCR-ed Books

Caihua Liu, Yalou Huang + Show 3 more

https://doi.org/10.1007/978-3-642-35734-3_8

Copy DOI

Publication Date: Jan 1, 2012

Citations: 15

Affiliation: Nankai University

#Table Of Contents Pages #SVM-based Method + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

This paper addresses the task of extracting the table of contents (TOC) from OCR-ed books. Since the OCR process misses a lot of layout and structural information, it is incapable of enabling navigation experience. A TOC is needed to provide a convenient and quick way to locate the content of interest. In this paper, we propose a hybrid method to extract TOC, which is composed of rule-based method and SVM-based method. The rule-based method mainly focuses on discovering the TOC from the books with TOC pages while the SVM-based method is employed to handle with the books without TOC pages. Experimental results indicate that the proposed methods obtain comparable performance against the other participants of the ICDAR 2011 Book structure extraction competition.

Full Text