Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection

Oana Inel,Tim Draws,Lora Aroyo

doi:10.1609/hcomp.v11i1.27547

Abstract

The rapid entry of machine learning approaches in our daily activities and high-stakes domains demands transparency and scrutiny of their fairness and reliability. To help gauge machine learning models' robustness, research typically focuses on the massive datasets used for their deployment, e.g., creating and maintaining documentation for understanding their origin, process of development, and ethical considerations. However, data collection for AI is still typically a one-off practice, and oftentimes datasets collected for a certain purpose or application are reused for a different problem. Additionally, dataset annotations may not be representative over time, contain ambiguous or erroneous annotations, or be unable to generalize across issues or domains. Recent research has shown these practices might lead to unfair, biased, or inaccurate outcomes. We argue that data collection for AI should be performed in a responsible manner where the quality of the data is thoroughly scrutinized and measured through a systematic set of appropriate metrics. In this paper, we propose a Responsible AI (RAI) methodology designed to guide the data collection with a set of metrics for an iterative in-depth analysis of the factors influencing the quality and reliability of the generated data. We propose a granular set of measurements to inform on the internal reliability of a dataset and its external stability over time. We validate our approach across nine existing datasets and annotation tasks and four content modalities. This approach impacts the assessment of data robustness used for AI applied in the real world, where diversity of users and content is eminent. Furthermore, it deals with fairness and accountability aspects in data collection by providing systematic and transparent quality analysis for data collections.

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection

Abstract

Published Version (Free)

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing

Lead the way for us

Journal: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing	Publication Date: Nov 3, 2023
Citations: 2

Similar Papers

Data Quality and Trust: Review of Challenges and Opportunities for Data Sharing in IoT
John Byabazaire ... Declan Delaney
Electronics | VOL. 9
John Byabazaire, et. al.John Byabazaire ... Declan Delaney
07 Dec 2020
Electronics | VOL. 9

AI for chemistry teaching: responsible AI and ethical considerations
Ron Blonder ... Yael Feldman-Maggor
Chemistry Teacher International | VOL. -
Ron Blonder, et. al.Ron Blonder ... Yael Feldman-Maggor
16 Oct 2024
Chemistry Teacher International | VOL. -

RAI Guidelines: Method for Generating Responsible AI Guidelines Grounded in Regulations and Usable by (Non-)Technical Roles
Marios Constantinides ... Mohammad Tahaei
Proceedings of the ACM on Human-Computer Interaction | VOL. 8
Marios Constantinides, et. al.Marios Constantinides ... Mohammad Tahaei
07 Nov 2024
Proceedings of the ACM on Human-Computer Interaction | VOL. 8

A uniform data set for determining outcomes in allied health primary contact services in Australia.
Nicole Moretto ... Maree Raymer
Australian journal of primary health | VOL. 26
Nicole Moretto, et. al.Nicole Moretto ... Maree Raymer
01 Jan 2020
Australian journal of primary health | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection

Abstract

Published Version (Free)

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing