Abstract

The World Wide Web is the main source of information for many organizations and common users. However, the analysis and selection of the web content is still an arduous manual task in many cases. When a web query is sent towards a web search engine, a list of URLs is received, frequently ordered by popularity (such as Google's PageRank algorithm). Then, the user must read and analyze each URL in order to find out the convenient information. In this work a method that automatically constructs a text report induced by a web query from a set of URLs is presented. The method extracts text slices (excerpts) from web pages considering the most similar text w.r.t. a web query as slicing criterion. A slice is composed by document object model (DOM) nodes, whereas similarity is calculated using standard techniques employed in natural language processing.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.