How to apply zero-shot learning to text data in substance use research: An overview and tutorial with media data.

Benjamin Riordan,Dan Anderson-Luxford,Emmanuel Kuntsche,Zhen He,Abraham Albert Bonela,Aiden Nibali

doi:10.1111/add.16427

Benjamin Riordan, Dan Anderson-Luxford + Show 4 more

Open Access

https://doi.org/10.1111/add.16427

Copy DOI

Journal: Addiction (Abingdon, England)	Publication Date: Jan 11, 2024
Citations: 1	License type: CC BY-NC 4.0

Affiliation: University of Melbourne

Abstract

A vast amount of media-related text data is generated daily in the form of social media posts, news stories or academic articles. These text data provide opportunities for researchers to analyse and understand how substance-related issues are being discussed. The main methods to analyse large text data (content analyses or specifically trained deep-learning models) require substantial manual annotation and resources. A machine-learning approach called 'zero-shot learning' may be quicker, more flexible and require fewer resources. Zero-shot learning uses models trained on large, unlabelled (or weakly labelled) data sets to classify previously unseen data into categories on which the model has not been specifically trained. This means that a pre-existing zero-shot learning model can be used to analyse media-related text data without the need for task-specific annotation or model training. This approach may be particularly important for analysing data that is time critical. This article describes the relatively new concept of zero-shot learning and how it can be applied to text data in substance use research, including a brief practical tutorial.

Full Text