Snorkel: Rapid Training Data Creation with Weak Supervision.

Alexander Ratner,Henry Ehrenberg,Sen Wu,Jason Fries,Stephen H Bach,Christopher Ré

doi:10.14778/3157794.3157797

Abstract

Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of- the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8× faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8× speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Snorkel: Rapid Training Data Creation with Weak Supervision.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Nov 1, 2017
Citations: 466

Similar Papers

Snorkel: rapid training data creation with weak supervision
Alexander Ratner ... Jason Fries
The VLDB Journal | VOL. 29
Alexander Ratner, et. al.Alexander Ratner ... Jason Fries
15 Jul 2019
The VLDB Journal | VOL. 29

Freedom is not free: Examining health equity for racial and ethnic minoritized veterans.
Tiffany J Riser ... Bonnie Mowinski Jennings
Research in nursing & health | VOL. 46
Tiffany J Riser, et. al.Tiffany J Riser ... Bonnie Mowinski Jennings
16 Mar 2023
Research in nursing & health | VOL. 46

Reintegration Problems and Treatment Interests Among Iraq and Afghanistan Combat Veterans Receiving VA Medical Care
Nina Sayer ... Maureen Murdoch
Psychiatric Services | VOL. 61
Nina Sayer, et. al.Nina Sayer ... Maureen Murdoch
01 Jun 2010
Psychiatric Services | VOL. 61

Diabetes Treatment Among VA Patients With Comorbid Serious Mental Illness
S L Krein ... C R Bingham
Psychiatric Services | VOL. 57
S L Krein, et. al.S L Krein ... C R Bingham
01 Jul 2006
Psychiatric Services | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Snorkel: Rapid Training Data Creation with Weak Supervision.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment