Abstract
Rapid advancements in machine learning techniques allow mass surveillance to be applied on larger scales and utilize more and more personal data. These developments demand reconsideration of the privacy-security dilemma, which describes the tradeoffs between national security interests and individual privacy concerns. By investigating mass surveillance techniques that use bulk data collection and machine learning algorithms, we show why these methods are unlikely to pinpoint terrorists in order to prevent attacks. The diverse characteristics of terrorist attacks—especially when considering lone-wolf terrorism—lead to irregular and isolated (digital) footprints. The irregularity of data affects the accuracy of machine learning algorithms and the mass surveillance that depends on them which can be explained by three kinds of known problems encountered in machine learning theory: class imbalance, the curse of dimensionality, and spurious correlations. Proponents of mass surveillance often invoke the distinction between collecting data and metadata, in which the latter is understood as a lesser breach of privacy. Their arguments commonly overlook the ambiguity in the definitions of data and metadata and ignore the ability of machine learning techniques to infer the former from the latter. Given the sparsity of datasets used for machine learning in counterterrorism and the privacy risks attendant with bulk data collection, policymakers and other relevant stakeholders should critically re-evaluate the likelihood of success of the algorithms and the collection of data on which they depend.
Highlights
In the past decades, governments around the world have increased their use of automated intelligence for the collection and evaluation of data in efforts to ensure national security
The PRISM programme run by the US National Security Administration (NSA) became one of the best known examples of large scale wiretapping operations after the leaks by Edward Snowden in 2013
We suggest that theoretical challenges inherent to machine learning techniques, i.e. class imbalance, the curse of dimensionality and spurious correlations, should be considered in determining, case by case, the likely and actual efficacy of national security strategies
Summary
Governments around the world have increased their use of automated intelligence for the collection and evaluation of data in efforts to ensure national security. There are many other ‘upstreaming’ programmes (i.e., direct tapping into communications infrastructure for data interception) that are used by governments, including the United Kingdom, Germany, Sweden, France, and The Netherlands Their goal is to detect suspicious behaviour of individuals within a large group of citizens (Bigo et al 2013). The surveillance tactics were frequently defended on the grounds that the collection of data had been confined to metadata (i.e., data about data), not actual data, and so it was less intrusive (a claim that we challenge in the third section) Opposition to these tactics— in contrast to general acceptance of targeted surveillance of individuals who have aroused suspicion—can be explained by the privacy-security dilemma (Van den Hoven et al 2012). The privacy-security dilemma describes the trade-off between the people’s right to privacy and their right to security, whereby the challenge lies in finding a reasonable balance between the two.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.