Data protection, like almost everything else in our lives, is challenged by the advent of ‘big data’. The Economist reports in its 2012 Outlook that the quantity of global digital data expanded from 130 exabytes in 2005 to 1,227 in 2010, and is predicted to rise to 7,910 exabytes in 2015. An exabyte is a quintillion bytes. If you find that hard to visualize, consider this: someone has calculated that if you loaded an exabyte of data on to DVDs in slimline jewel cases, and then loaded them into Boeing 747 aircraft, it would take 13,513 planes to transport one exabyte of data. Using DVDs to move the data collected globally in 2010 would require a fleet of more than 16 million jumbo jets. And exabytes are rapidly becoming passe. The volume of stored information in the world is growing so fast that scientists have had to create new terms, including zettabyte and yottabyte, to describe the flood of data. The importance of big data is not just a result of its size or how fast it is growing (about 60 per cent a year), but also the reality that the data come from an amazing array of sources. The Internet captures lots of data. Facebook alone has more than 800 million active users, more than half of whom log in every day, where they generate more than 900 million web pages and upload more than 250 million photos every day. In 2010, a lifetime ago in Internet time, Google sites were used by more than 1 billion unique visitors every month who spent a collective 200 billion minutes on its sites. Google-owned YouTube passed 1 trillion video playbacks in 2011. Email, IM, VOIP calls, and other communications generate tens of trillions of recorded messages every year. Credit and debit cards, checks, and other financial activities provide a steady stream of billions of financial transactions recorded every month. And increasingly sensor networks—video surveillance cameras, embedded computers in automobiles, the more than 5 billion cell phones we carry—record locations, movements, and activities. We can now talk meaningfully about ubiquitous data collection, in which almost everything we do results in data being captured and stored by one or more third parties. It is significant that those data are digital. They can be stored, shared, searched, combined, and duplicated with extraordinary speed and at very little cost. And they are accompanied by metadata—data about when and where and how the underlying information was generated. Some experts estimate that there may be five times more metadata than the information we are aware of creating, and this metadata can be extraordinarily revealing. We used to define ‘big data’ as being data sets so large that a supercomputer was needed to process them, but another aspect of big data has been that not only has analytical capacity soared, but also become far more inexpensive and widely distributed. It is not just that today’s mobile devices have more computing power than the desktop machines of a decade ago, but also that we can now link data and computers virtually so that huge computational tasks can be undertaken affordably and conveniently. In fact, we are witnessing the movement of more of that computational power, as well as storage of the tidal wave of data we are generating and collecting, into the ‘cloud’. Cloud computing is all the rage, but despite the overuse and misuse of the term, it is increasingly clear that many of the data and resources we used to believe that we had to possess locally—in computers, handheld devices, entertainment systems, and business record systems—can now be provided with greater security and reliability (and at lower cost) remotely. When thinking about the importance of ‘big data’, it is critical to remember that access to so much data, from so many different sources, and to the computing