Keynote: Privacy and Security in Real-World Data Annotation

Emma L Tonkin

doi:10.1109/percomworkshops48775.2020.9156138

Abstract

An accurate ground truth is key to both our ability to engineer systems able to represent and respond to human context and, perhaps even more importantly, our ability to develop interfaces capable of effectively representing datasets to the humans who use and work with the system. Data annotation, therefore, is of tremendous value in building and exploiting pervasive systems. However, it is also among the most expensive activities in many ubiquitous or pervasive system engineering processes, as well as presenting significant practical challenges to the implementer and the participant alike: methods of annotation are frequently either based in active user interaction - potentially disruptive to the user's routine, time-consuming and often offputting for the user - or based in a method of surveillance such as video or audio data collection, enabling post-hoc manual annotation of the material. In this talk, I briefly review legal and practical challenges to annotation through the lens of practical experience in a realworld digital health engineering context. I discuss strategies through which the task of annotation may be approached. The costs involved include both the financial cost of manual annotation and the toll that annotation strategies may take on participants in terms of time or exposure to surveillance technologies. The decisions taken in choosing annotation platforms, methods and proposed outcomes may have a significant impact on the quality of the annotations and on their applicability for particular engineering tasks, and, importantly, on the quality of the datasets resulting from real-world data collection processes. As participants volunteer their time and, often, access to their private homes, to support engineering research and development, we may consider what our responsibilities are from an ethical engineering perspective, collecting sensitive personal information in a manner that respects the requirements of the dataset, the goals of the engineering process and the participants' own goals and interests. Finally, I identify challenges: do we always endeavour to build security, privacy assessment and seeking ongoing participant feedback and consent into our research? Do we actively involve engineers into the design and pilot stages of our annotation work, evaluating the quality of annotations against key competency questions prior to commencing the data collection process? In view of the many recent controversies surrounding collection of ground truth information, what more can the sector do to ensure that a broad variety of perspectives are represented in the engineering process and are clearly communicated to participants, system users and users of the datasets we build?

Full Text