Abstract

The information available on the internet is growing at very high rate. Especially, news articles are added and updated round-the-clock. News retrieval systems which are in use today, are not very much capable of handling such huge amounts of news articles effectively and accurately. Due to the need for frequent and intensive processing, a news retrieval system needs to be scalable, robust and fault tolerant. By the use of Cloud technology, this can achieved. A news retrieval system on the cloud can be used to fetch, process, organize and also be used for faster and accurate retrieval. It can be made to operate with less supervision or none at all. Cloud Press, a next generation news retrieval system presented here, is designed and implemented, to overcome most of the pit falls of the news retrieval systems, which are in place today. It uses MapReduce paradigm for fetching, processing and organizing all the news articles in a distributed fashion. MapReduce approach allows it to split the tasks into sub-tasks and then allows them to be assigned to various nodes present in the cloud, which are then finished and consolidated to give one final output. Thus, the processing speed is increased and the processing time is reduced, greatly. Cloud Press uses various novel algorithms for parallel crawling of the web and distributed processing of the news articles. A distributed database is used for storing and indexing of news articles. The retrieval system also includes a query expansion feature for searching of news articles and a novel visualization technique is used to visualize the retrieved news articles.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.