Abstract

Responsible computing ultimately requires that technical communities develop and adopt tools, processes, and practices that mitigate harms and support human flourishing. Prior efforts toward the responsible development and use of datasets, machine learning models, and other technical systems have led to the creation of documentation toolkits to facilitate transparency, diagnosis, and inclusion. This work takes the next step: to catalyze community uptake, alongside toolkit improvement. Specifically, starting from one such proposed toolkit specialized for language datasets, data statements for natural language processing, we explore how to improve the toolkit in three senses: (1) the content of the toolkit itself, (2) engagement with professional practice, and (3) moving from a conceptual proposal to a tested schema that the intended community of use may readily adopt. To achieve these goals, we first conducted a workshop with natural language processing practitioners to identify gaps and limitations of the toolkit as well as to develop best practices for writing data statements, yielding an interim improved toolkit. Then we conducted an analytic comparison between the interim toolkit and another documentation toolkit, datasheets for datasets. Based on these two integrated processes, we present our revised Version 2 schema and best practices in a guide for writing data statements. Our findings more generally provide integrated processes for co-evolving both technology and practice to address ethical concerns within situated technical communities.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.