The discipline of Information and Communication Technologies for Development (ICT4D) gained traction against the exponential growth in mobile phone connectivity. There has been a multitude of projects, services, applications and even policies that aim to leverage the mobile phone to contribute to the broader development of society. This has gone hand in hand with much academic interest in understanding the effects of mobile phone connectivity on development. However it is only of late that attention is being paid to posing development related questions to the basic data artifacts that are left behind by society when consuming mobile phone services. These artifacts come under the class of Transaction Generated Data (TGD) having been recorded by mobile phone operators when certain events (for e.g. when one makes a call) occur for the purposes of billing and network optimization. Given the volumes of TGD that is produced it also falls under the category of Big Data. Big data is an amorphous category that could, for instance, include data from an astronomical observatory or the full text of all the digitized books from the 20th century. Like many others, the 2011 McKinsey Global Institute report on Big Data focuses solely on the "big" in defining the term: "Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze" (Manyika et al., 2011). This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data with the implicit assumption that as technology advances over time, the size of datasets that qualify as big data will also increase. Also note that the definition can vary by sector, depending on what kinds of software tools are commonly available and what sizes of datasets are common in a particular industry. With those caveats, big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes). Gartner (2011) introduced additional important definitional characteristics in addition to volume, namely velocity and variety. Velocity refers to the speed at which data is generated, assessed and analyzed. The term "Variety" encompasses the fact that data can exist as different media (text, audio, video) and come in different format (structured and unstructured). Value is a fourth definitional characteristic that acknowledges the potential high socio-economic value that may be generated by Big Data (Jones, 2012). Included within its scope is the category of transaction-generated data (TGD), also sometimes described as "data exhaust." This category was first discussed in 1991, though the term then used was transaction-generated information. The value of this subset of big data is that it is directly connected to human behavior and its accuracy is generally high because the data is generated for a purpose, such as the completion of telephone call or a commercial transaction.TGD has great potential for broader development and is already being leveraged to predict flu trends, forecast unemployment, understand societal ties and overall socio-economic well-being, etc.However unlike in developed countries, the only streams of comprehensive big data with wide socio-economic coverage in developing countries are those generated by telecom networks, because commercial banks and supermarkets, for example, do not reach a majority of people. Even whilst internet access is growing fast in developing economies, as noted in the 2013 Measuring the Information Society report by ITU, overall household internet penetration in developing economies was expected to be only 28% as of end 2013, as opposed to almost 80% in developed economies. Basic mobile subscriptions however have almost peaked at 96% globally (ITU, 2013). Therefore in the near term, it is non-Internet related mobile network big data that has the widest socioeconomic coverage. Such data is already being utilized for development and monitoring not just in developed economies but also in developing economies. Therefore the focus of this paper is mainly on mobile network big data for development.This policy paper serves to enlighten policy makers in developing economies, as to the range of behavioral insights on mobility, connectivity and consumption that can be extracted from mobile network TGD. Importantly, this paper also addresses how these insights can be leveraged by multiple policy domains inter alia transport, health, and economic development.
Read full abstract