Abstract

In these years, Official Statistics has acknowledged the value of Big Data and has started exploring the use of diverse sources in several domains. For some of the sources, the object can be, a part from privacy restrictions, easily related to a statistical unit. In those cases, if a unit identifier was available, the opportunity to link Big data to already existing statistical data at micro-level could allow to enlarge the content, the coverage, the accuracy and the timeliness of official statistics. This could be the case, for instance, of the Internet-scraped data. In this setting, new challenges arise for data integration experts in official statistics, due to the deep differences with respect to the familiar framework in which administrative data have been integrated for a long time in order to produce statistical outputs, i.e. business and population registers. In this study, exploiting a real case as a starting point, we describe novelties and challenges in integrating Internet-scraped data with traditional statistical datasets, from the entity extraction and recognition phases to the unit matching algorithms, as well. As case study, we propose the linkage of Internet-scraped information with data related to agritourisms, as reported in the Italian Farm Register, that is obtained by the integration of seven administrative sources. In order to overcome limits and rigidities of the so far well-established linkage procedures, we explore new techniques not yet introduced in the official statistics production system. Finally, we devote the due attention to the output quality evaluation, in order to entirely understand benefits and risks of the integration and to allow the analysts to take into account potential integration errors in subsequent analyses.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call