Abstract

Web page filtering technology intends to filter out the large number of the repeated and theme-unrelated noise information and obtain useful information. Some web filtering methods cannot make full use of the layout and visual features. In view of the new mainstream “DIV+CSS” designing style of modern commercial web sites, this paper summarizes that elements laying in the same div blocks have common semantic features and proposed a DIV_FOREST model to represent the web pages. And in combination with the Vision-based Page Segmentation Algorithm, a DVPS Algorithm which considers both layout features and visual features was proposed to improve web page filtering efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call