Segmenting the web document with document object model

Jianli Luo Jianli Luo,Cuihua Xie Cuihua Xie,Jie Shen Jie Shen

doi:10.1109/scc.2004.1358040

Segmenting the web document with document object model

Jianli Luo Jianli Luo, Cuihua Xie Cuihua Xie + Show 1 more

https://doi.org/10.1109/scc.2004.1358040

Copy DOI

Publication Date: Sep 15, 2004

Citations: 9

Affiliation: Yangzhou University

#DOM Tree #Text Segmentation + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We present a model about DOM-based Web document segmentation using the semistructure information of Web pages. This model builds DOM tree of the Web page by parsing HTML tags which organize structure of the Web page. By improving traditional plain text segmentation algorithms, we expand these algorithms to suit Web text segmentation. Then, with the boundaries between the nodes in the DOM tree, precision of segmentation results can be increased further.

Full Text