Discriminating Meaningful Web Tables from Decorative Tables Using a Composite Kernel

Jeong-Woo Son,Se-Young Park,Hyun-Je Song,Seong-Bae Park,Jae-An Lee,Sang-Jo Lee

doi:10.1109/wiiat.2008.241

Abstract

Information extraction from world wide web has been paid great attention to. Since a table is a well-organized and summarized knowledge expression for a domain, it is of great importance to extract information from the tables. However, many tables in web pages are used not to transfer information but to decorate the pages. Therefore, it is one of the most critical tasks in web table mining to discriminate the meaningful tables from the decorative ones. The main obstacle of this task comes from the difficulty of generating relevant features for the discrimination. This paper proposes a novel method to discriminate them using a composite kernel which combines a parse tree kernel and a linear kernel. Since a web table is represented as a parse tree by a HTML parser, the parse tree kernel can be naturally used in determining the similarity between trees, and the linear kernel with content features is used to make up for the weak points of the parse tree kernel. The support vector machines with the composite kernel distinguish with high accuracy the meaningful tables from the decorative ones. A series of experiments show that the proposed method achieves the state-of-the-art performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Discriminating Meaningful Web Tables from Decorative Tables Using a Composite Kernel

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Web table discrimination with composition of rich structural and content information
Jeong-Woo Son ... Seong-Bae Park
Applied Soft Computing Journal | VOL. 13
Jeong-Woo Son, et. al.Jeong-Woo Son ... Seong-Bae Park
21 Aug 2012
Applied Soft Computing Journal | VOL. 13

Two novel composite kernels for relation extraction
Xiaofeng Zhang ... Zhiqiang Gao
-
Xiaofeng Zhang, et. al. Xiaofeng Zhang ... Zhiqiang Gao
01 Jul 2011
01 Jul 2011

장식 테이블과 의미 있는 테이블 식별을 위한 커널 기반의 구조 자질
Jeong-Woo Son ... Jun-Ho Go
Journal of Korean Institute of Intelligent Systems | VOL. 21
Jeong-Woo Son, et. al.Jeong-Woo Son ... Jun-Ho Go
25 Oct 2011
Journal of Korean Institute of Intelligent Systems | VOL. 21

A Feature Space Expression to Analyze Dependency of Korean Clauses with a Composite Kernel
Sang-Soo Kim ... Sang-Jo Lee
-
Sang-Soo Kim, et. al.Sang-Soo Kim ... Sang-Jo Lee
01 Jan 2007
01 Jan 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discriminating Meaningful Web Tables from Decorative Tables Using a Composite Kernel

Abstract

Talk to us

Similar Papers