Abstract

Web news has become an important information resource, and we can collect and analyze Web news to acquire desired information. In this paper, an effective and efficient Web-based knowledge acquisition approach is proposed for extracting Web news full content from news site databases using site-side news search engines as query interfaces. We do not crawl the news sites to collect news pages. Instead, we use news search engines affiliated to the news sites to search for the desired news articles directly from the news site databases. We give the search keywords to the search engines and extract the full content of the news articles without the process of machine learning or pattern matching. This approach is applicable to general news sites, and the experimental results show that it can extract a large amount of Web news content from news site databases automatically, quickly, and accurately.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call