Web scraping is a technique that makes it possible to obtain information from websites automatically. As online shopping grows in popularity, it became an abundant source of information on the prices of goods sold by retailers. The use of scraped data usually allows, in addition to a significant reduction of costs of price research, the improvement of the precision of inflation estimates and real-time tracking. For this reason, web scraping is a popular research tool both for statistical centers (Eurostat, British Office of National Statistics, Belgian Statbel) and universities (e.g. the Billion Prices Project conducted at Massachusetts Institute of Technology). However, the use of scraped data to calculate inflation brings about many challenges at the stage of their collection, processing, and aggregation. The aim of the study is to compare various methods of calculating price indices of clothing and footwear on the basis of scraped data. Using data from one of the largest online stores selling clothing and footwear for the period of February 2018–November 2019, the author compared the results of the Jevons chain index, the GEKS-J index and the GEKS-J expanding and updating window methods. As a result of the calculations, a high chain index drift was confirmed, and very similar results were found using the extension methods and the updated calculation window (excluding the FBEW method).
Read full abstract