A large-scale and PCR-referenced vocal audio dataset for COVID-19

Jobie Budd,Kieran Baker,Emma Karoune,Harry Coppock,Selina Patel,Richard Payne,Ana Tendero Cañadas,Alexander Titcomb,David Hurley,Sabrina Egglestone,Lorraine Butler,Jonathon Mellor,George Nicholson,Ivan Kiskin,Vasiliki Koutra,Radka Jersakova,Rachel A Mckendry,Peter Diggle,Sylvia Richardson,Björn W Schuller,Steven Gilmour,Davide Pigoli,Stephen Roberts,Josef Packham,Tracey Thornley,Chris Holmes

doi:10.1038/s41597-024-03492-w

Abstract

The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the ‘Speak up and help beat coronavirus’ digital survey alongside demographic, symptom and self-reported respiratory condition data. Digital survey submissions were linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,565 of 72,999 participants and 24,105 of 25,706 positive cases. Respiratory symptoms were reported by 45.6% of participants. This dataset has additional potential uses for bioacoustics research, with 11.3% participants self-reporting asthma, and 27.2% with linked influenza PCR test results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Data	Publication Date: Jun 27, 2024
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

A large-scale and PCR-referenced vocal audio dataset for COVID-19

Abstract

Talk to us

Similar Papers

More From: Scientific Data

Lead the way for us

Similar Papers

Build a Trained Data of Tesseract OCR engine for Tifinagh Script Recognition
Ali Benaissa ... Abdelkhalak Bahri
Data and Metadata | VOL. 2
Ali Benaissa, et. al.Ali Benaissa ... Abdelkhalak Bahri
09 Dec 2023
Data and Metadata | VOL. 2

PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction.
Jonas C Ditz ... Jacqueline Wistuba-Hamprecht
Bioinformatics (Oxford, England) | VOL. 39
Jonas C Ditz, et. al.Jonas C Ditz ... Jacqueline Wistuba-Hamprecht
30 Jun 2023
Bioinformatics (Oxford, England) | VOL. 39

Data Science Toolkit: An all-in-one python library to help researchers and practitioners in implementing data science-related algorithms with less effort
Chouaib El Hachimi ... Salwa Belaqziz
Software Impacts | VOL. 12
Chouaib El Hachimi, et. al.Chouaib El Hachimi ... Salwa Belaqziz
01 May 2022
Software Impacts | VOL. 12

Bangladeshi medicinal plant dataset
Bijly Borkatulla ... Prince Mahmud
Data in Brief | VOL. 48
Bijly Borkatulla, et. al.Bijly Borkatulla ... Prince Mahmud
07 May 2023
Data in Brief | VOL. 48

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A large-scale and PCR-referenced vocal audio dataset for COVID-19

Abstract

Talk to us

Similar Papers

More From: Scientific Data