Abstract

Shifting paradigms in Official Statistics lead to a widespread use of administrative records to support or to create an alternative for census and surveys. At the same time demand for diversified detailed information is increasing. Official Statistics in order to meet this demand need to seek for new data sources. Internet data sources or more general -- Big Data -- could be one of them. Potential usefulness of these new sources of statistical information should not be neglected.The aim of the paper is to assess representativeness of Internet data sources (IDS) for real estate market in Poland. These sources could be used for describing demand and supply on secondary real estate market in more detailed way that is done with existing methodology. In order to assess representativeness, information from official surveys and other data sources will be used. Due to lack of sufficient literature on this issue, own research will be conducted to enhance information from official statistics. For the purpose of the paper Internet data sources will be defined. Register TERYT containing information on street names was used to correct information taken from Internet data sources. Special program for automated data collection (web spider) was developed. All the calculation was done with R statistical software and additional packages (XML, RCurl, httr).

Highlights

  • Increasing information needs at a low level of aggregation encourage the development of small area estimation and stimulate the search for new data sources that could support or enhance existing sources

  • Internet data sources (IDS) and big data have recently become the subject of evaluation by statisticians as potential statistical data sources

  • Despite the increasing interest in these new data sources there are several aspects that need to be considered in order to meet the criteria of statistical data sources

Read more

Summary

Introduction

Increasing information needs at a low level of aggregation encourage the development of small area estimation and stimulate the search for new data sources that could support or enhance existing sources (reporting, censuses or surveys) This process has been continuing since 1970s when statisticians and National Statistical Institutes (NSIs) started using and adopting administrative records into their statistical systems (Wallgren and Wallgren 2014). The following sources are discussed in the context of Official Statistics: mobile networks (e.g. to track movement, travel routes), social networking sites (e.g. Facebook, Twitter, Linkedin), e-commerce (e.g. eBay, Amazon, price comparison services) or Google search trends They are not being investigated widely as a statistical data source or from the point of view of estimation theory. The article ends with the discussion of results and final remarks

Internet data sources
Data sources on real estate market in Poland
Empirical evaluation of representativeness
Summary and discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call