Abstract

Web page segmentation has extensive value in web applications. As the traditional method which only based on Document Object Model (Dom) tree poorly reflects the actual semantic structure of a page, the vision-based measures which attempt to understand the perception of user have attracted great concern in recently. Moreover, the visual page layout structuring is more suitable to suggest a semantic partitioning of a page. Vision-based Page Segmentation (VIPS) algorithm is a notable technique which improves the situation that the visual effect is not consistent with the corresponding DOM tree of web pages. However, due to the increasing complicated structure of web pages and ever-changing of web design, the rules in VIPS become numerous and are no longer fully applicable. To alleviate the shortcoming of VIPS, this paper introduces Hough transform in image processing and takes advantage from DOM trees and visual cues. Our proposed method first extracts the visual separator in web pages according to the perceptive of web designers, then adjusts the segmented information blocks by Hough transformation in the hope of enhancing the VIPS algorithm compatibility and improving the performance of information extraction in applications. The quality of our proposed method is evaluated by subjective and objective measurements, and the experiment results show that this method has come to an anticipant result which even better than the classical one.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call