PLeveraging Django and Redis using Web Scraping

Dr K M* Anandkumar,Abilash R,Abhinav R,Abhinav Raman

doi:10.35940/ijrte.a1916.059120

Abstract

Web scraping is also known as data scraping and it is used for extracting data from sites. The software used for this may directly access the World Wide Web by using the Hypertext Transfer Protocol or by using a web browser. Over the years, due to advancements in web development and its technology, various frameworks have come in use and almost all of websites are dynamic with their content being served from CMS. This makes it tough to extract data since there is no common template for extracting data. Hence, we use RSS. Rich Site Summary is a kind of timeline allowing users and also applications to gain access to the updates on websites in a standardized, computer-readable format. This project combines the use of RSS to extract data from websites and serve users in a robust and easy way. The differentiation is that this project uses server side caching to serve users almost instantaneously without the need to perform data extraction from the requested site all over again. This is done using Redis and Django.

Full Text