Synthetic training data promise considerable performance improvements in machine learning (ML) surveillance tasks, including such applications as crowd counting, pedestrian tracking, and face recognition. In this context, synthetic training data constitute techno-fixes primarily by virtue of acting as “edge cases”—data that are hard to come by in the “real world” yet straightforward to produce synthetically—which are used to enhance ML systems’ resilience. In this dialogue paper, I mobilize Haggerty and Ericson’s (2000) concept of the surveillant assemblage to argue that synthetic training data raise well-known, entrenched surveillance issues. Specifically, I contend that conceptualizing synthetic data as but one component of larger surveillant assemblages is analytically meaningful because it challenges techno-deterministic imaginaries that posit synthetic data as fixes to deep-rooted surveillance issues. To exemplify this stance, I draw from several examples of how synthetic training data are already used, illustrating how they may both intensify the disappearance of disappearance and contribute to the leveling of hierarchies of surveillance depending upon the surveillant assemblage that they reconfigure. Overall, this intervention urges surveillance studies scholarship to attend to how synthetic data reconfigure specific surveillant assemblages, with both problematic and emancipatory implications.
Read full abstract